OpenAI Codex - The AI Coding Agent Powered by GPT-5.5
OpenAI Codex is no longer the experimental code-completion API from 2021. It has evolved into a full-blown AI coding agent - a cloud-based system powered by GPT-5.5 that can read your entire codebase, write multi-file changes, generate and run tests, create pull requests, and execute parallel tasks inside a secure sandbox. With over 4 million weekly active users as of April 2026, it has become the most widely deployed AI coding agent in the world.
This guide covers everything you need to know about the modern Codex platform: the model evolution from codex-1 through GPT-5.5, the cloud sandbox architecture, pricing across all tiers, the open-source Codex CLI, the AGENTS.md configuration standard, competitive benchmarks against Claude Code and Copilot, the Codex Security vulnerability scanner, and real-world adoption patterns. Whether you are evaluating Codex for your team or already using it and want to go deeper, this is the definitive reference.
1. What Is OpenAI Codex (2025+)?
If you remember the original Codex from 2021, forget everything about it. That was a code-completion API built on GPT-3 that powered GitHub Copilot's early autocomplete features. OpenAI deprecated it in March 2023. The Codex of 2025 and beyond is an entirely different product - an autonomous coding agent that lives inside ChatGPT and operates in a cloud-based sandboxed environment.
The modern Codex launched in May 2025 as a dedicated panel within the ChatGPT interface. Instead of completing single lines of code, it accepts high-level tasks like "refactor the authentication module to use JWT tokens" or "add pagination to the /users API endpoint and write integration tests." It then clones your repository into a cloud sandbox, reads the relevant files, formulates a plan, writes the code, runs the tests, and presents you with a complete diff or pull request.
Core Identity
At its core, Codex is three things:
- An agent, not a model. While it is powered by GPT-5.5 (and previously codex-1 and o3), the product is the agent layer - the orchestration, tool use, sandbox execution, and task management that wraps the model.
- Cloud-native. Every task runs in an isolated cloud sandbox with its own filesystem, package manager, and optional internet access. Your local machine is never touched.
- Repository-aware. Codex connects to your GitHub repositories (with GitLab and Bitbucket support in beta) and understands your project structure, dependencies, test suites, and CI/CD configuration.
What It Can Do Today
As of April 2026, Codex handles a wide range of software engineering tasks:
- Write new features across multiple files with correct imports and dependencies
- Fix bugs by reading stack traces, reproducing the issue, and verifying the fix
- Refactor code - rename symbols, extract functions, restructure modules
- Generate unit tests, integration tests, and end-to-end tests
- Create pull requests with descriptive titles, summaries, and linked issues
- Answer questions about your codebase by reading and analyzing the source
- Run up to 8 parallel tasks simultaneously on different parts of your codebase
- Delegate subtasks to specialized subagents for complex multi-step workflows
- Interact with web browsers and desktop applications via Computer Use
- Scan codebases for security vulnerabilities with Codex Security
2. The Model Evolution - codex-1 Through GPT-5.5
Understanding Codex requires understanding the models that power it. The agent has gone through a rapid evolution in under a year, with each generation bringing significant capability improvements.
codex-1 (May 2025)
The first model purpose-built for the Codex agent was codex-1, a fine-tuned variant of OpenAI's o3 reasoning model. Unlike general-purpose models, codex-1 was specifically optimized for software engineering tasks: reading large codebases, following coding conventions, writing idiomatic code, and operating within the constraints of a sandboxed environment.
codex-1 achieved a 72.1% score on SWE-Bench Verified, a benchmark that measures a model's ability to resolve real GitHub issues from popular open-source projects. For context, the base o3 model scored 69.1% on the same benchmark - a meaningful gap that demonstrated the value of task-specific fine-tuning.
| Model | SWE-Bench Verified | Release | Notes |
|---|---|---|---|
| codex-1 | 72.1% | May 2025 | Fine-tuned o3 for coding tasks |
| o3 (base) | 69.1% | April 2025 | General reasoning model |
| GPT-4.1 | 54.6% | April 2025 | Non-reasoning baseline |
| Claude 3.5 Sonnet | 49.0% | June 2024 | Anthropic's coding model |
codex-mini (June 2025)
OpenAI followed up with codex-mini, a smaller and faster variant optimized for latency-sensitive tasks. While it scored lower on SWE-Bench, it was 3-4x faster for common operations like code review, simple bug fixes, and test generation. This became the default model for Codex tasks that didn't require deep reasoning.
GPT-5.0 Integration (September 2025)
When GPT-5.0 launched, Codex was among the first products to integrate it. The jump was substantial - GPT-5.0 brought a 200K native context window (up from codex-1's 128K), dramatically better instruction following, and improved ability to maintain consistency across large multi-file changes. The SWE-Bench score climbed to approximately 78%.
GPT-5.5 - The Current Engine (February 2026)
The current Codex agent runs on GPT-5.5, which represents the most capable coding model OpenAI has shipped. Key improvements over GPT-5.0 include:
- 256K context window - enough to hold entire medium-sized codebases in a single context
- Improved agentic behavior - better at decomposing complex tasks, recovering from errors, and knowing when to ask for clarification
- Native tool use - the model was trained with tool-use data from the start, making sandbox operations, file I/O, and shell commands more reliable
- Reduced hallucination - significantly fewer invented APIs, non-existent functions, or fabricated library features
- Multi-language fluency - strong performance across Python, TypeScript, Rust, Go, Java, C++, C#, Ruby, PHP, and Swift
3. How It Works - Cloud Sandbox Architecture
The architecture behind Codex is what separates it from simple code-generation tools. Every task runs inside an isolated cloud sandbox - a lightweight virtual environment that provides a complete development setup without touching your local machine.
The Sandbox Environment
When you assign a task to Codex, the following happens:
- Repository clone: Codex clones your connected GitHub repository into the sandbox. For large repos, it uses sparse checkout to pull only the relevant directories.
- Environment setup: The sandbox installs dependencies based on your project's configuration files (package.json, requirements.txt, Cargo.toml, go.mod, etc.).
- Task execution: The agent reads relevant files, formulates a plan, writes code, and executes commands (build, test, lint) inside the sandbox.
- Result delivery: Once complete, Codex presents a diff of all changes, test results, and optionally creates a pull request directly on GitHub.
Isolation and Security
Each sandbox is a microVM - a lightweight virtual machine that provides hardware-level isolation. This means:
- Tasks cannot access your local filesystem, environment variables, or credentials
- Each task gets a fresh environment - no state leaks between tasks
- The sandbox has its own network namespace with configurable internet access
- All sandbox data is destroyed after task completion (configurable retention for debugging)
Internet Access Modes
Codex offers three network modes for sandboxes:
| Mode | Network Access | Use Case |
|---|---|---|
| Isolated (default) | No internet access | Maximum security, internal codebases |
| Package-only | Access to package registries (npm, PyPI, crates.io) | Tasks that need to install dependencies |
| Full access | Unrestricted internet | Tasks that need to fetch APIs, documentation, or external resources |
Architecture Diagram
The high-level flow looks like this:
User (ChatGPT)
|
v
Codex Orchestrator
|
+-- Task Queue (up to 8 parallel tasks)
| |
| v
+-- Sandbox Pool
|
+-- microVM 1: [clone repo] -> [install deps] -> [agent loop] -> [diff/PR]
+-- microVM 2: [clone repo] -> [install deps] -> [agent loop] -> [diff/PR]
+-- ...
|
v
GitHub API (PR creation, branch push)
The Agent Loop
Inside each sandbox, the agent operates in a classic observe-think-act loop:
# Simplified pseudocode of the Codex agent loop
while not task_complete:
# 1. Observe: Read files, check test output, review errors
context = read_relevant_files(task, codebase)
# 2. Think: Reason about what to do next
plan = model.reason(task, context, previous_actions)
# 3. Act: Write code, run commands, create files
for action in plan.actions:
if action.type == "write_file":
write_file(action.path, action.content)
elif action.type == "run_command":
output = shell(action.command)
elif action.type == "read_file":
context.add(read_file(action.path))
# 4. Verify: Run tests, check for errors
test_results = shell("npm test") # or pytest, cargo test, etc.
if test_results.all_passed:
task_complete = True
else:
# Loop back with error context
context.add(test_results.errors)
The key insight is that Codex doesn't just generate code and hope for the best. It verifies its own work by running your test suite inside the sandbox. If tests fail, it reads the error output, reasons about the cause, and iterates. This loop typically runs 3-7 iterations for complex tasks.
4. Pricing and Availability
One of the most significant changes in early 2026 was OpenAI making Codex available across all ChatGPT tiers, including the free plan. Here is the complete pricing breakdown as of April 2026.
| Plan | Monthly Price | Codex Access | Parallel Tasks | Notes |
|---|---|---|---|---|
| Free | $0 | Limited (approx. 5 tasks/day) | 1 | GPT-5.5 mini model, no internet access |
| Plus | $20 | Standard quota | 2 | Full GPT-5.5, package-only network |
| Pro | $100 / $200 | High quota / Unlimited | 4 / 8 | Full internet access, priority queue |
| Business | $25/user | Team quota pool | 4 per user | Admin controls, audit logs, SSO |
| Enterprise | Custom | Custom quota | Custom | VPC deployment, data residency, SLA |
Codex-Only Seats
A notable addition in Q1 2026 was the introduction of Codex-only seats for Business and Enterprise plans. These are discounted seats ($15/user/month on Business) for team members who only need Codex access without the full ChatGPT feature set. This is targeted at development teams where not every engineer needs GPT-5.5 for general conversation but everyone needs the coding agent.
API Pricing
For teams building on top of Codex programmatically, the API pricing follows the standard OpenAI token-based model:
GPT-5.5 (Codex tasks):
Input: $2.50 / 1M tokens
Output: $10.00 / 1M tokens
Cached: $1.25 / 1M tokens (50% discount)
GPT-5.5 mini (fast tasks):
Input: $0.30 / 1M tokens
Output: $1.20 / 1M tokens
Cached: $0.15 / 1M tokens
5. Key Capabilities
Codex's capabilities have expanded rapidly since launch. Here is a detailed breakdown of what the agent can do as of April 2026.
Multi-File Edits
Unlike simple code generators that produce isolated snippets, Codex understands project structure. When you ask it to add a new API endpoint, it will:
- Create the route handler file
- Update the router configuration to register the new route
- Add the corresponding data model or schema if needed
- Update TypeScript types or interfaces across the project
- Modify the OpenAPI/Swagger spec if one exists
- Add the route to any middleware chains (auth, validation, rate limiting)
This cross-file awareness is powered by the agent's ability to read and index your entire repository before making changes. It builds an internal map of imports, exports, type definitions, and call graphs.
Test Generation
Codex generates tests that match your existing test patterns. If your project uses Jest with React Testing Library, it writes Jest tests. If you use pytest with fixtures, it writes pytest tests with fixtures. It reads your existing test files to learn your conventions:
// Codex-generated test matching existing project conventions
describe('UserService', () => {
let service: UserService;
let mockRepo: jest.Mocked<UserRepository>;
beforeEach(() => {
mockRepo = createMockRepository();
service = new UserService(mockRepo);
});
it('should return paginated users with correct metadata', async () => {
mockRepo.findAll.mockResolvedValue({
data: [mockUser({ id: '1' }), mockUser({ id: '2' })],
total: 15,
});
const result = await service.getUsers({ page: 1, pageSize: 2 });
expect(result.data).toHaveLength(2);
expect(result.pagination).toEqual({
page: 1,
pageSize: 2,
totalPages: 8,
totalItems: 15,
});
expect(mockRepo.findAll).toHaveBeenCalledWith({
skip: 0,
take: 2,
});
});
it('should throw NotFoundError for non-existent user', async () => {
mockRepo.findById.mockResolvedValue(null);
await expect(service.getUser('999')).rejects.toThrow(NotFoundError);
});
});
Pull Request Creation
Codex can create pull requests directly on GitHub. When a task completes, you can choose to:
- Review the diff in the ChatGPT interface and apply changes manually
- Create a PR with an auto-generated title, description, and linked issue
- Push to a branch without creating a PR (for further local work)
The PR descriptions are surprisingly good - they include a summary of changes, the reasoning behind design decisions, a list of files modified, and test results. Codex also adds inline comments on complex changes to explain its approach.
Parallel Task Execution
Pro users can run up to 8 tasks simultaneously, each in its own sandbox. This is transformative for large-scale refactoring. For example, you could run these tasks in parallel:
Task 1: "Migrate all API routes from Express to Fastify"
Task 2: "Update all test files to use the new Fastify test helpers"
Task 3: "Update the Docker configuration for Fastify"
Task 4: "Update the CI/CD pipeline for the new build process"
Each task runs independently, and Codex is smart enough to detect potential conflicts between parallel tasks. If Task 1 and Task 2 both modify the same file, Codex will flag the conflict and suggest a merge strategy.
Subagents (GA March 2026)
Subagents allow Codex to decompose complex tasks into smaller subtasks and delegate them to specialized child agents. This went generally available in March 2026 and is one of the most powerful features for complex engineering work.
When you give Codex a broad task like "set up a complete authentication system with JWT, refresh tokens, role-based access control, and password reset via email," it might spawn subagents for:
- Subagent 1: JWT token generation and validation middleware
- Subagent 2: Refresh token rotation and storage
- Subagent 3: RBAC permission model and decorators
- Subagent 4: Password reset email flow with templates
- Subagent 5: Integration tests for the complete auth flow
The parent agent coordinates the subagents, resolves dependencies between their outputs, and merges the results into a coherent changeset. Subagents share the same repository context but operate in isolated execution environments.
Computer Use (April 2026)
The newest capability, launched in April 2026, is Computer Use - the ability for Codex to interact with graphical interfaces. This extends the agent beyond code editing into:
- Browser testing: Codex can open your web application in a headless browser, navigate through user flows, and verify that UI changes render correctly
- Visual regression: Compare screenshots before and after changes to detect unintended visual side effects
- Documentation: Navigate your deployed application and generate screenshots for documentation
- Form filling and testing: Interact with forms, buttons, and dynamic UI elements to test user workflows end-to-end
6. AGENTS.md Configuration
One of Codex's most influential contributions to the broader AI tooling ecosystem is AGENTS.md - a configuration file that tells AI coding agents how to work with your repository. What started as a Codex-specific feature has become an industry standard governed by the Linux Foundation.
What Is AGENTS.md?
AGENTS.md is a Markdown file placed in the root of your repository (or in subdirectories for module-specific instructions). It provides structured guidance to AI agents about:
- Project architecture and conventions
- Build, test, and lint commands
- Code style preferences and patterns to follow
- Files and directories the agent should not modify
- Security-sensitive areas that require human review
- Dependency management rules
- PR and commit message conventions
Example AGENTS.md
# AGENTS.md
## Project Overview
This is a TypeScript monorepo using Turborepo with three packages:
- `packages/api` - Express REST API
- `packages/web` - Next.js 15 frontend
- `packages/shared` - Shared types and utilities
## Build & Test
- Build: `turbo build`
- Test: `turbo test`
- Lint: `turbo lint`
- Type check: `turbo typecheck`
## Code Conventions
- Use functional components with hooks (no class components)
- Use `zod` for all runtime validation
- Use `drizzle-orm` for database queries (not raw SQL)
- Error handling: use `Result<T, E>` pattern from `packages/shared/result.ts`
- All API endpoints must have OpenAPI annotations
## Do Not Modify
- `packages/shared/generated/` - auto-generated from OpenAPI spec
- `*.migration.ts` files - managed by drizzle-kit
- `.github/workflows/` - CI/CD managed by platform team
## Security Review Required
- Any changes to `packages/api/src/middleware/auth.ts`
- Any changes to `packages/api/src/middleware/rbac.ts`
- Any new environment variable usage
## PR Conventions
- Branch naming: `codex/{issue-number}-{short-description}`
- Commit messages: conventional commits (feat:, fix:, chore:, etc.)
- PR description must reference the GitHub issue number
The Linux Foundation Standard
In Q1 2026, the Linux Foundation adopted AGENTS.md as a formal open standard under its AI tooling working group. This means:
- Vendor-neutral: AGENTS.md works with Codex, Claude Code, Cursor, Copilot, Amazon Q, and any other agent that supports the spec
- Versioned schema: The spec has a formal versioning system (currently v1.2) with backward compatibility guarantees
- Validation tooling: A CLI validator (
agents-md-lint) checks your AGENTS.md for correctness and completeness - Community governance: Changes to the spec go through an RFC process with input from all major AI tooling vendors
Hierarchical Configuration
AGENTS.md supports hierarchical configuration. You can place files at multiple levels:
repo-root/
AGENTS.md # Global project rules
packages/
api/
AGENTS.md # API-specific rules (inherits from root)
web/
AGENTS.md # Frontend-specific rules (inherits from root)
shared/
AGENTS.md # Shared library rules (inherits from root)
Child AGENTS.md files inherit from parent files and can override specific sections. This is particularly useful in monorepos where different packages have different conventions.
7. Codex CLI - Open Source Terminal Agent
While the cloud-based Codex agent lives inside ChatGPT, Codex CLI is its open-source counterpart - a terminal-based coding agent that runs locally on your machine. It has become one of the most popular developer tools on GitHub, with 72,000+ stars and a thriving contributor community.
Key Facts
| Attribute | Detail |
|---|---|
| Repository | github.com/openai/codex |
| Language | Rust |
| License | Apache 2.0 |
| GitHub Stars | 72,000+ |
| First Release | April 2025 |
| Current Version | 1.x (stable) |
Installation
# macOS / Linux
brew install openai/tap/codex
# Or via cargo (Rust toolchain required)
cargo install codex-cli
# Or download pre-built binary
curl -fsSL https://cli.codex.openai.com/install.sh | sh
# Verify installation
codex --version
How It Differs from Cloud Codex
Codex CLI and cloud Codex share the same underlying models but differ in execution:
| Feature | Cloud Codex (ChatGPT) | Codex CLI (Terminal) |
|---|---|---|
| Execution | Remote cloud sandbox | Local machine |
| File access | Cloned repo in sandbox | Direct filesystem access |
| Internet | Configurable per task | Uses your local network |
| Parallel tasks | Up to 8 | 1 (sequential) |
| PR creation | Built-in GitHub integration | Via git commands |
| Cost | Included in ChatGPT plan | API token usage (pay-per-token) |
| Approval modes | Automatic | suggest / auto-edit / full-auto |
Approval Modes
Codex CLI has three approval modes that control how much autonomy the agent has:
# Suggest mode (default) - agent proposes changes, you approve each one
codex "add input validation to the user registration endpoint"
# Auto-edit mode - agent can edit files but asks before running commands
codex --auto-edit "refactor the database layer to use connection pooling"
# Full-auto mode - agent has full autonomy (use with caution)
codex --full-auto "fix all ESLint errors in the project"
Why Rust?
OpenAI chose Rust for Codex CLI for several practical reasons:
- Startup time: The CLI launches in under 50ms, compared to 500ms+ for Node.js-based alternatives
- Memory efficiency: Handles large codebases without excessive memory usage
- Single binary: No runtime dependencies - download one binary and it works
- Cross-platform: Compiles natively for macOS (ARM/x86), Linux, and Windows
- Safety: Rust's ownership model prevents the memory bugs that plague long-running agent processes
8. Competitive Landscape
The AI coding agent market in 2026 is fiercely competitive. Codex is the most widely used, but it faces strong competition from several directions. Here is an honest comparison.
Codex vs. Claude Code (Anthropic)
Claude Code is Codex's most direct competitor - a terminal-based coding agent powered by Claude 3.5 Opus and Claude 4 Sonnet. The comparison is nuanced:
| Dimension | OpenAI Codex | Claude Code |
|---|---|---|
| Token efficiency | 3x more efficient | Higher token consumption per task |
| Blind code review preference | 33% preferred | 67% preferred |
| Execution model | Cloud sandbox + local CLI | Local terminal only |
| Parallel tasks | Up to 8 | 1 (sequential) |
| PR creation | Built-in (cloud) | Via git commands |
| Subagents | GA (March 2026) | GA (February 2026) |
| Open source | CLI only (Apache 2.0) | Not open source |
| IDE integration | ChatGPT web + CLI | Terminal + VS Code extension |
The headline stat is striking: Codex is 3x more token-efficient than Claude Code for equivalent tasks, meaning it costs significantly less per task at API pricing. However, in blind code review studies where developers evaluated the output quality without knowing which tool produced it, Claude Code was preferred 67% of the time. This suggests Claude Code produces more idiomatic, readable, and well-structured code, even if it uses more tokens to get there.
The practical takeaway: Codex excels at high-throughput, parallel task execution and CI/CD integration. Claude Code excels at interactive pair-programming where code quality and developer experience matter most. Many teams use both.
Codex vs. GitHub Copilot
Copilot and Codex are siblings - both from the OpenAI/Microsoft ecosystem - but they serve different roles:
- Copilot is an IDE-integrated assistant focused on real-time code completion, inline suggestions, and chat within VS Code/JetBrains. It is reactive - you write code, it suggests completions.
- Codex is an autonomous agent focused on task completion. You describe what you want, and it does the work independently.
They are complementary, not competitive. Many developers use Copilot for moment-to-moment coding and Codex for larger tasks like feature implementation, refactoring, and test generation. GitHub has been integrating Codex capabilities into Copilot Workspace, blurring the line between the two products.
Codex vs. Cursor
Cursor is an AI-native IDE (a VS Code fork) that embeds AI deeply into the editing experience. Its strengths are:
- Inline editing: Select code, describe a change, and Cursor modifies it in place
- Multi-model support: Use GPT-5.5, Claude, Gemini, or local models
- Codebase indexing: Cursor indexes your entire project for context-aware suggestions
- Composer: Cursor's agent mode for multi-file changes
Cursor's advantage is the tight IDE integration - changes happen in your editor with full undo/redo support. Codex's advantage is the cloud sandbox model, which means tasks run in the background without blocking your editor, and you can run multiple tasks in parallel.
Codex vs. Amazon Q Developer
Amazon Q Developer is AWS's AI coding assistant, deeply integrated with the AWS ecosystem:
- AWS expertise: Q excels at AWS-specific tasks - CloudFormation, CDK, Lambda, IAM policies
- Code transformation: Q can migrate Java 8 to Java 17, .NET Framework to .NET Core
- Security scanning: Built-in vulnerability detection tuned for AWS services
- IDE integration: VS Code, JetBrains, and the AWS Console
Q is the best choice for AWS-heavy workloads. Codex is more general-purpose and stronger at non-cloud coding tasks. For teams building on AWS, using Q for infrastructure code and Codex for application code is a common pattern.
Market Share (April 2026)
| Tool | Weekly Active Users | Primary Use Case |
|---|---|---|
| GitHub Copilot | 15M+ | IDE code completion |
| OpenAI Codex | 4M+ | Autonomous coding agent |
| Cursor | 3M+ | AI-native IDE |
| Claude Code | 1.5M+ | Terminal coding agent |
| Amazon Q Developer | 1M+ | AWS-integrated assistant |
9. Codex Security
In Q1 2026, OpenAI launched Codex Security - a specialized mode that uses the Codex agent to scan codebases for security vulnerabilities. This is not a traditional static analysis tool. It uses GPT-5.5's reasoning capabilities to understand code semantics and identify vulnerabilities that pattern-matching tools miss.
How It Works
Codex Security operates in the same cloud sandbox as regular Codex tasks. When you trigger a security scan, the agent:
- Clones your repository into an isolated sandbox
- Builds a semantic understanding of the codebase - data flows, trust boundaries, authentication paths
- Identifies potential vulnerabilities using a combination of pattern matching and reasoning
- Verifies each finding by tracing the data flow from source to sink
- Generates a report with severity ratings, affected code paths, and suggested fixes
- Optionally creates PRs with the fixes applied
The Chromium/OpenSSL/PHP Audit
The most impressive demonstration of Codex Security came when OpenAI ran it against three of the most security-critical open-source projects: Chromium, OpenSSL, and PHP. The results were remarkable:
| Project | Critical Issues Found | High Issues Found | False Positive Rate |
|---|---|---|---|
| Chromium | 340+ | 1,200+ | ~12% |
| OpenSSL | 280+ | 450+ | ~8% |
| PHP | 180+ | 600+ | ~15% |
| Total | 800+ | 2,250+ | ~12% avg |
Finding 800+ critical issues across these heavily-audited codebases - projects that have been reviewed by thousands of security researchers over decades - demonstrated that AI-powered security scanning can find vulnerabilities that traditional tools and human reviewers miss. The ~12% false positive rate is competitive with commercial SAST tools like Snyk, Semgrep, and SonarQube.
Vulnerability Categories
Codex Security is particularly strong at finding:
- Memory safety issues: Buffer overflows, use-after-free, double-free (in C/C++ codebases)
- Injection vulnerabilities: SQL injection, command injection, XSS, template injection
- Authentication bypasses: Logic errors in auth flows, missing authorization checks
- Cryptographic weaknesses: Weak algorithms, improper key management, timing attacks
- Race conditions: TOCTOU bugs, concurrent access without proper locking
- Supply chain risks: Suspicious dependencies, typosquatting packages, outdated libraries with known CVEs
10. Limitations and Known Issues
Codex is powerful, but it is not magic. Understanding its limitations is essential for using it effectively and setting appropriate expectations.
Context Window Constraints
Even with GPT-5.5's 256K context window, very large codebases can exceed the agent's ability to hold all relevant context simultaneously. For monorepos with millions of lines of code, Codex uses heuristics to select the most relevant files, which means it can miss cross-module dependencies or subtle interactions between distant parts of the codebase.
Hallucination (Reduced but Not Eliminated)
GPT-5.5 hallucinates significantly less than earlier models, but it still happens. Common hallucination patterns include:
- Inventing API methods that don't exist in a library (especially for less popular packages)
- Generating import paths that don't match the actual project structure
- Assuming configuration options that aren't available in the version you're using
- Creating test assertions based on assumed behavior rather than actual behavior
The sandbox's ability to run tests catches many of these issues, but not all. Always review Codex's output before merging.
Language and Framework Coverage
Codex performs best with popular languages and frameworks that have extensive training data. Performance degrades for:
- Niche languages (Elixir, Haskell, OCaml, Zig) - functional but less idiomatic
- Internal/proprietary frameworks - Codex can't know about your company's custom framework unless AGENTS.md provides detailed guidance
- Very new libraries - anything released after the model's training cutoff may not be well-represented
- Domain-specific languages (DSLs) - Terraform HCL and SQL are well-supported, but custom DSLs are hit-or-miss
Sandbox Limitations
- No GPU access: The sandbox doesn't have GPU support, so ML training tasks or CUDA code can't be tested
- Limited system services: No Docker-in-Docker, no systemd, no database servers (unless you use the full-access network mode to connect to external services)
- Timeout: Tasks have a maximum execution time (15 minutes on Free/Plus, 30 minutes on Pro, configurable on Enterprise)
- Filesystem size: Sandbox storage is limited to 10GB, which can be insufficient for projects with large binary assets
Non-Determinism
Like all LLM-based tools, Codex is non-deterministic. Running the same task twice may produce different code. The code will be functionally equivalent in most cases, but the exact implementation details - variable names, code structure, algorithm choices - can vary. This makes it unsuitable for tasks that require exact reproducibility.
11. Real-World Adoption
Codex's growth from launch to 4 million+ weekly active users in under a year makes it one of the fastest-adopted developer tools in history. Here is how organizations are using it in practice.
Adoption by the Numbers
| Metric | Value (April 2026) |
|---|---|
| Weekly active users | 4M+ |
| Tasks completed per day | 12M+ |
| PRs created per week | 2.5M+ |
| Enterprise customers | 500+ |
| Codex CLI daily downloads | 50K+ |
| AGENTS.md adoption (top 1K GitHub repos) | ~40% |
Common Use Patterns
Based on OpenAI's published usage data and community reports, the most common Codex workflows are:
1. Test Generation (28% of tasks)
The single most popular use case. Teams point Codex at untested code and ask it to generate comprehensive test suites. This is particularly valuable for legacy codebases that were built without tests - Codex can read the implementation, understand the expected behavior, and generate tests that serve as both documentation and regression protection.
2. Bug Fixes (22% of tasks)
Developers paste error messages, stack traces, or bug reports and let Codex trace the issue through the codebase. The agent's ability to read multiple files, understand data flow, and verify fixes by running tests makes it highly effective for debugging.
3. Feature Implementation (18% of tasks)
New feature development - adding endpoints, building UI components, implementing business logic. This is where Codex's multi-file editing and test generation capabilities shine.
4. Refactoring (15% of tasks)
Code modernization, dependency upgrades, pattern migrations (e.g., callbacks to async/await, class components to hooks), and structural reorganization.
5. Code Review and Documentation (12% of tasks)
Using Codex to review PRs, explain complex code, generate documentation, and add inline comments to poorly documented codebases.
6. Security Scanning (5% of tasks)
Running Codex Security scans as part of CI/CD pipelines or ad-hoc security audits.
Enterprise Case Studies
Several large organizations have shared their Codex adoption results:
- A Fortune 500 fintech company reported a 40% reduction in time-to-merge for feature PRs after deploying Codex across their 200-person engineering team. Test coverage increased from 62% to 84% in three months.
- A mid-size SaaS startup (50 engineers) uses Codex for all test generation and achieved 90%+ coverage across their TypeScript monorepo. They estimate Codex saves each developer 6-8 hours per week.
- An open-source project maintainer uses Codex to triage and fix issues from community contributors, reducing the average issue resolution time from 12 days to 3 days.
Getting Started
If you are new to Codex, start with the free tier to explore its capabilities. Connect a GitHub repository, try a few test generation tasks, and review the output quality. Once you are comfortable, upgrade to Plus or Pro for higher quotas and parallel task execution. Add an AGENTS.md to your repository for the best results.
12. Frequently Asked Questions
Is OpenAI Codex the same as the old Codex API from 2021?
No. The original Codex was a code-completion API based on GPT-3 that powered early GitHub Copilot. It was deprecated in March 2023. The current Codex (2025+) is a completely different product - an autonomous coding agent powered by GPT-5.5 that runs in a cloud sandbox and can perform complex multi-file engineering tasks.
Does Codex have access to my source code?
Yes, when you connect a GitHub repository, Codex clones it into an isolated cloud sandbox to read and modify files. The sandbox is destroyed after task completion. OpenAI states that code processed by Codex is not used to train models for Business and Enterprise plans. For Free, Plus, and Pro plans, you can opt out of training data usage in your settings.
Can Codex work with private repositories?
Yes. Codex supports private GitHub repositories through OAuth integration. You grant Codex read/write access to specific repositories - it does not require access to your entire GitHub account. GitLab and Bitbucket support is in beta as of April 2026.
How does Codex compare to Claude Code?
Codex is approximately 3x more token-efficient than Claude Code, making it cheaper per task. However, Claude Code is preferred by 67% of developers in blind code review studies for output quality. Codex excels at parallel task execution and CI/CD integration, while Claude Code is favored for interactive pair-programming. Many teams use both tools for different workflows.
What languages does Codex support?
Codex supports all major programming languages. It performs best with Python, TypeScript/JavaScript, Rust, Go, Java, C#, C++, Ruby, PHP, and Swift. It can work with less common languages but may produce less idiomatic code. The AGENTS.md file can provide language-specific guidance to improve output quality.
Can I use Codex CLI without a ChatGPT subscription?
Yes. Codex CLI uses the OpenAI API directly, so you only need an API key with credits. You pay per token used. This is separate from ChatGPT subscription pricing. Many developers prefer this model for predictable, usage-based costs.
Is Codex CLI really open source?
Yes. Codex CLI is fully open source under the Apache 2.0 license. The source code is available on GitHub with 72,000+ stars. You can fork it, modify it, and use it in commercial products. The agent logic, tool integrations, and approval modes are all open. The only proprietary component is the GPT-5.5 model itself, which is accessed via the OpenAI API.
What is AGENTS.md and do I need one?
AGENTS.md is a configuration file that tells AI coding agents how to work with your repository. It includes project conventions, build commands, code style rules, and areas that require human review. While not strictly required, adding one significantly improves Codex's output quality. It is now a Linux Foundation standard supported by all major AI coding tools.
Can Codex deploy my code to production?
Codex can create PRs and trigger CI/CD pipelines, but it does not directly deploy to production. The recommended workflow is: Codex creates a PR, your CI/CD pipeline runs automated checks, a human reviews and approves, and your existing deployment process handles the rest. This keeps a human in the loop for production changes.