Skip to content

OpenAI Codex - The AI Coding Agent Powered by GPT-5.5

OpenAI Codex AI coding agent architecture diagram

OpenAI Codex is no longer the experimental code-completion API from 2021. It has evolved into a full-blown AI coding agent - a cloud-based system powered by GPT-5.5 that can read your entire codebase, write multi-file changes, generate and run tests, create pull requests, and execute parallel tasks inside a secure sandbox. With over 4 million weekly active users as of April 2026, it has become the most widely deployed AI coding agent in the world.

This guide covers everything you need to know about the modern Codex platform: the model evolution from codex-1 through GPT-5.5, the cloud sandbox architecture, pricing across all tiers, the open-source Codex CLI, the AGENTS.md configuration standard, competitive benchmarks against Claude Code and Copilot, the Codex Security vulnerability scanner, and real-world adoption patterns. Whether you are evaluating Codex for your team or already using it and want to go deeper, this is the definitive reference.

1. What Is OpenAI Codex (2025+)?

If you remember the original Codex from 2021, forget everything about it. That was a code-completion API built on GPT-3 that powered GitHub Copilot's early autocomplete features. OpenAI deprecated it in March 2023. The Codex of 2025 and beyond is an entirely different product - an autonomous coding agent that lives inside ChatGPT and operates in a cloud-based sandboxed environment.

The modern Codex launched in May 2025 as a dedicated panel within the ChatGPT interface. Instead of completing single lines of code, it accepts high-level tasks like "refactor the authentication module to use JWT tokens" or "add pagination to the /users API endpoint and write integration tests." It then clones your repository into a cloud sandbox, reads the relevant files, formulates a plan, writes the code, runs the tests, and presents you with a complete diff or pull request.

Key distinction: The 2021 Codex was a code-completion model (think autocomplete). The 2025+ Codex is a coding agent (think junior developer who reads your codebase, writes code, runs tests, and submits PRs).

Core Identity

At its core, Codex is three things:

  • An agent, not a model. While it is powered by GPT-5.5 (and previously codex-1 and o3), the product is the agent layer - the orchestration, tool use, sandbox execution, and task management that wraps the model.
  • Cloud-native. Every task runs in an isolated cloud sandbox with its own filesystem, package manager, and optional internet access. Your local machine is never touched.
  • Repository-aware. Codex connects to your GitHub repositories (with GitLab and Bitbucket support in beta) and understands your project structure, dependencies, test suites, and CI/CD configuration.

What It Can Do Today

As of April 2026, Codex handles a wide range of software engineering tasks:

  • Write new features across multiple files with correct imports and dependencies
  • Fix bugs by reading stack traces, reproducing the issue, and verifying the fix
  • Refactor code - rename symbols, extract functions, restructure modules
  • Generate unit tests, integration tests, and end-to-end tests
  • Create pull requests with descriptive titles, summaries, and linked issues
  • Answer questions about your codebase by reading and analyzing the source
  • Run up to 8 parallel tasks simultaneously on different parts of your codebase
  • Delegate subtasks to specialized subagents for complex multi-step workflows
  • Interact with web browsers and desktop applications via Computer Use
  • Scan codebases for security vulnerabilities with Codex Security

2. The Model Evolution - codex-1 Through GPT-5.5

Understanding Codex requires understanding the models that power it. The agent has gone through a rapid evolution in under a year, with each generation bringing significant capability improvements.

codex-1 (May 2025)

The first model purpose-built for the Codex agent was codex-1, a fine-tuned variant of OpenAI's o3 reasoning model. Unlike general-purpose models, codex-1 was specifically optimized for software engineering tasks: reading large codebases, following coding conventions, writing idiomatic code, and operating within the constraints of a sandboxed environment.

codex-1 achieved a 72.1% score on SWE-Bench Verified, a benchmark that measures a model's ability to resolve real GitHub issues from popular open-source projects. For context, the base o3 model scored 69.1% on the same benchmark - a meaningful gap that demonstrated the value of task-specific fine-tuning.

Model SWE-Bench Verified Release Notes
codex-1 72.1% May 2025 Fine-tuned o3 for coding tasks
o3 (base) 69.1% April 2025 General reasoning model
GPT-4.1 54.6% April 2025 Non-reasoning baseline
Claude 3.5 Sonnet 49.0% June 2024 Anthropic's coding model

codex-mini (June 2025)

OpenAI followed up with codex-mini, a smaller and faster variant optimized for latency-sensitive tasks. While it scored lower on SWE-Bench, it was 3-4x faster for common operations like code review, simple bug fixes, and test generation. This became the default model for Codex tasks that didn't require deep reasoning.

GPT-5.0 Integration (September 2025)

When GPT-5.0 launched, Codex was among the first products to integrate it. The jump was substantial - GPT-5.0 brought a 200K native context window (up from codex-1's 128K), dramatically better instruction following, and improved ability to maintain consistency across large multi-file changes. The SWE-Bench score climbed to approximately 78%.

GPT-5.5 - The Current Engine (February 2026)

The current Codex agent runs on GPT-5.5, which represents the most capable coding model OpenAI has shipped. Key improvements over GPT-5.0 include:

  • 256K context window - enough to hold entire medium-sized codebases in a single context
  • Improved agentic behavior - better at decomposing complex tasks, recovering from errors, and knowing when to ask for clarification
  • Native tool use - the model was trained with tool-use data from the start, making sandbox operations, file I/O, and shell commands more reliable
  • Reduced hallucination - significantly fewer invented APIs, non-existent functions, or fabricated library features
  • Multi-language fluency - strong performance across Python, TypeScript, Rust, Go, Java, C++, C#, Ruby, PHP, and Swift
Model selection: Codex automatically selects the appropriate model based on task complexity. Simple tasks use a fast variant for quick turnaround, while complex multi-file refactors use the full GPT-5.5 reasoning model. You can override this in settings.

3. How It Works - Cloud Sandbox Architecture

The architecture behind Codex is what separates it from simple code-generation tools. Every task runs inside an isolated cloud sandbox - a lightweight virtual environment that provides a complete development setup without touching your local machine.

The Sandbox Environment

When you assign a task to Codex, the following happens:

  1. Repository clone: Codex clones your connected GitHub repository into the sandbox. For large repos, it uses sparse checkout to pull only the relevant directories.
  2. Environment setup: The sandbox installs dependencies based on your project's configuration files (package.json, requirements.txt, Cargo.toml, go.mod, etc.).
  3. Task execution: The agent reads relevant files, formulates a plan, writes code, and executes commands (build, test, lint) inside the sandbox.
  4. Result delivery: Once complete, Codex presents a diff of all changes, test results, and optionally creates a pull request directly on GitHub.

Isolation and Security

Each sandbox is a microVM - a lightweight virtual machine that provides hardware-level isolation. This means:

  • Tasks cannot access your local filesystem, environment variables, or credentials
  • Each task gets a fresh environment - no state leaks between tasks
  • The sandbox has its own network namespace with configurable internet access
  • All sandbox data is destroyed after task completion (configurable retention for debugging)

Internet Access Modes

Codex offers three network modes for sandboxes:

Mode Network Access Use Case
Isolated (default) No internet access Maximum security, internal codebases
Package-only Access to package registries (npm, PyPI, crates.io) Tasks that need to install dependencies
Full access Unrestricted internet Tasks that need to fetch APIs, documentation, or external resources
Security note: Full internet access means the agent can make outbound HTTP requests. If your codebase contains API keys or secrets in environment variables, use the isolated or package-only mode to prevent accidental exfiltration. Codex strips common secret patterns from sandbox environments, but defense in depth is always recommended.

Architecture Diagram

The high-level flow looks like this:

User (ChatGPT)
    |
    v
Codex Orchestrator
    |
    +-- Task Queue (up to 8 parallel tasks)
    |       |
    |       v
    +-- Sandbox Pool
            |
            +-- microVM 1: [clone repo] -> [install deps] -> [agent loop] -> [diff/PR]
            +-- microVM 2: [clone repo] -> [install deps] -> [agent loop] -> [diff/PR]
            +-- ...
            |
            v
        GitHub API (PR creation, branch push)

The Agent Loop

Inside each sandbox, the agent operates in a classic observe-think-act loop:

# Simplified pseudocode of the Codex agent loop
while not task_complete:
    # 1. Observe: Read files, check test output, review errors
    context = read_relevant_files(task, codebase)
    
    # 2. Think: Reason about what to do next
    plan = model.reason(task, context, previous_actions)
    
    # 3. Act: Write code, run commands, create files
    for action in plan.actions:
        if action.type == "write_file":
            write_file(action.path, action.content)
        elif action.type == "run_command":
            output = shell(action.command)
        elif action.type == "read_file":
            context.add(read_file(action.path))
    
    # 4. Verify: Run tests, check for errors
    test_results = shell("npm test")  # or pytest, cargo test, etc.
    
    if test_results.all_passed:
        task_complete = True
    else:
        # Loop back with error context
        context.add(test_results.errors)

The key insight is that Codex doesn't just generate code and hope for the best. It verifies its own work by running your test suite inside the sandbox. If tests fail, it reads the error output, reasons about the cause, and iterates. This loop typically runs 3-7 iterations for complex tasks.

4. Pricing and Availability

One of the most significant changes in early 2026 was OpenAI making Codex available across all ChatGPT tiers, including the free plan. Here is the complete pricing breakdown as of April 2026.

Plan Monthly Price Codex Access Parallel Tasks Notes
Free $0 Limited (approx. 5 tasks/day) 1 GPT-5.5 mini model, no internet access
Plus $20 Standard quota 2 Full GPT-5.5, package-only network
Pro $100 / $200 High quota / Unlimited 4 / 8 Full internet access, priority queue
Business $25/user Team quota pool 4 per user Admin controls, audit logs, SSO
Enterprise Custom Custom quota Custom VPC deployment, data residency, SLA

Codex-Only Seats

A notable addition in Q1 2026 was the introduction of Codex-only seats for Business and Enterprise plans. These are discounted seats ($15/user/month on Business) for team members who only need Codex access without the full ChatGPT feature set. This is targeted at development teams where not every engineer needs GPT-5.5 for general conversation but everyone needs the coding agent.

API Pricing

For teams building on top of Codex programmatically, the API pricing follows the standard OpenAI token-based model:

GPT-5.5 (Codex tasks):
  Input:   $2.50 / 1M tokens
  Output:  $10.00 / 1M tokens
  Cached:  $1.25 / 1M tokens (50% discount)

GPT-5.5 mini (fast tasks):
  Input:   $0.30 / 1M tokens
  Output:  $1.20 / 1M tokens
  Cached:  $0.15 / 1M tokens
Cost tip: Codex aggressively uses prompt caching for repository context. If you run multiple tasks against the same repo, subsequent tasks benefit from cached file contents, reducing input token costs by up to 50%. The AGENTS.md file (covered in Section 6) is always cached.

5. Key Capabilities

Codex's capabilities have expanded rapidly since launch. Here is a detailed breakdown of what the agent can do as of April 2026.

Multi-File Edits

Unlike simple code generators that produce isolated snippets, Codex understands project structure. When you ask it to add a new API endpoint, it will:

  • Create the route handler file
  • Update the router configuration to register the new route
  • Add the corresponding data model or schema if needed
  • Update TypeScript types or interfaces across the project
  • Modify the OpenAPI/Swagger spec if one exists
  • Add the route to any middleware chains (auth, validation, rate limiting)

This cross-file awareness is powered by the agent's ability to read and index your entire repository before making changes. It builds an internal map of imports, exports, type definitions, and call graphs.

Test Generation

Codex generates tests that match your existing test patterns. If your project uses Jest with React Testing Library, it writes Jest tests. If you use pytest with fixtures, it writes pytest tests with fixtures. It reads your existing test files to learn your conventions:

// Codex-generated test matching existing project conventions
describe('UserService', () => {
  let service: UserService;
  let mockRepo: jest.Mocked<UserRepository>;

  beforeEach(() => {
    mockRepo = createMockRepository();
    service = new UserService(mockRepo);
  });

  it('should return paginated users with correct metadata', async () => {
    mockRepo.findAll.mockResolvedValue({
      data: [mockUser({ id: '1' }), mockUser({ id: '2' })],
      total: 15,
    });

    const result = await service.getUsers({ page: 1, pageSize: 2 });

    expect(result.data).toHaveLength(2);
    expect(result.pagination).toEqual({
      page: 1,
      pageSize: 2,
      totalPages: 8,
      totalItems: 15,
    });
    expect(mockRepo.findAll).toHaveBeenCalledWith({
      skip: 0,
      take: 2,
    });
  });

  it('should throw NotFoundError for non-existent user', async () => {
    mockRepo.findById.mockResolvedValue(null);

    await expect(service.getUser('999')).rejects.toThrow(NotFoundError);
  });
});

Pull Request Creation

Codex can create pull requests directly on GitHub. When a task completes, you can choose to:

  • Review the diff in the ChatGPT interface and apply changes manually
  • Create a PR with an auto-generated title, description, and linked issue
  • Push to a branch without creating a PR (for further local work)

The PR descriptions are surprisingly good - they include a summary of changes, the reasoning behind design decisions, a list of files modified, and test results. Codex also adds inline comments on complex changes to explain its approach.

Parallel Task Execution

Pro users can run up to 8 tasks simultaneously, each in its own sandbox. This is transformative for large-scale refactoring. For example, you could run these tasks in parallel:

Task 1: "Migrate all API routes from Express to Fastify"
Task 2: "Update all test files to use the new Fastify test helpers"
Task 3: "Update the Docker configuration for Fastify"
Task 4: "Update the CI/CD pipeline for the new build process"

Each task runs independently, and Codex is smart enough to detect potential conflicts between parallel tasks. If Task 1 and Task 2 both modify the same file, Codex will flag the conflict and suggest a merge strategy.

Subagents (GA March 2026)

Subagents allow Codex to decompose complex tasks into smaller subtasks and delegate them to specialized child agents. This went generally available in March 2026 and is one of the most powerful features for complex engineering work.

When you give Codex a broad task like "set up a complete authentication system with JWT, refresh tokens, role-based access control, and password reset via email," it might spawn subagents for:

  • Subagent 1: JWT token generation and validation middleware
  • Subagent 2: Refresh token rotation and storage
  • Subagent 3: RBAC permission model and decorators
  • Subagent 4: Password reset email flow with templates
  • Subagent 5: Integration tests for the complete auth flow

The parent agent coordinates the subagents, resolves dependencies between their outputs, and merges the results into a coherent changeset. Subagents share the same repository context but operate in isolated execution environments.

Computer Use (April 2026)

The newest capability, launched in April 2026, is Computer Use - the ability for Codex to interact with graphical interfaces. This extends the agent beyond code editing into:

  • Browser testing: Codex can open your web application in a headless browser, navigate through user flows, and verify that UI changes render correctly
  • Visual regression: Compare screenshots before and after changes to detect unintended visual side effects
  • Documentation: Navigate your deployed application and generate screenshots for documentation
  • Form filling and testing: Interact with forms, buttons, and dynamic UI elements to test user workflows end-to-end
Computer Use is in early access. It works well for straightforward web applications but can struggle with complex SPAs that rely heavily on client-side state, WebSocket connections, or canvas-based rendering. Expect rapid improvements through Q2-Q3 2026.

6. AGENTS.md Configuration

One of Codex's most influential contributions to the broader AI tooling ecosystem is AGENTS.md - a configuration file that tells AI coding agents how to work with your repository. What started as a Codex-specific feature has become an industry standard governed by the Linux Foundation.

What Is AGENTS.md?

AGENTS.md is a Markdown file placed in the root of your repository (or in subdirectories for module-specific instructions). It provides structured guidance to AI agents about:

  • Project architecture and conventions
  • Build, test, and lint commands
  • Code style preferences and patterns to follow
  • Files and directories the agent should not modify
  • Security-sensitive areas that require human review
  • Dependency management rules
  • PR and commit message conventions

Example AGENTS.md

# AGENTS.md

## Project Overview
This is a TypeScript monorepo using Turborepo with three packages:
- `packages/api` - Express REST API
- `packages/web` - Next.js 15 frontend
- `packages/shared` - Shared types and utilities

## Build & Test
- Build: `turbo build`
- Test: `turbo test`
- Lint: `turbo lint`
- Type check: `turbo typecheck`

## Code Conventions
- Use functional components with hooks (no class components)
- Use `zod` for all runtime validation
- Use `drizzle-orm` for database queries (not raw SQL)
- Error handling: use `Result<T, E>` pattern from `packages/shared/result.ts`
- All API endpoints must have OpenAPI annotations

## Do Not Modify
- `packages/shared/generated/` - auto-generated from OpenAPI spec
- `*.migration.ts` files - managed by drizzle-kit
- `.github/workflows/` - CI/CD managed by platform team

## Security Review Required
- Any changes to `packages/api/src/middleware/auth.ts`
- Any changes to `packages/api/src/middleware/rbac.ts`
- Any new environment variable usage

## PR Conventions
- Branch naming: `codex/{issue-number}-{short-description}`
- Commit messages: conventional commits (feat:, fix:, chore:, etc.)
- PR description must reference the GitHub issue number

The Linux Foundation Standard

In Q1 2026, the Linux Foundation adopted AGENTS.md as a formal open standard under its AI tooling working group. This means:

  • Vendor-neutral: AGENTS.md works with Codex, Claude Code, Cursor, Copilot, Amazon Q, and any other agent that supports the spec
  • Versioned schema: The spec has a formal versioning system (currently v1.2) with backward compatibility guarantees
  • Validation tooling: A CLI validator (agents-md-lint) checks your AGENTS.md for correctness and completeness
  • Community governance: Changes to the spec go through an RFC process with input from all major AI tooling vendors
Adoption tip: Even if you don't use Codex, adding an AGENTS.md to your repository improves the behavior of every AI coding tool your team uses. It takes 15 minutes to write and pays dividends across all AI-assisted development workflows.

Hierarchical Configuration

AGENTS.md supports hierarchical configuration. You can place files at multiple levels:

repo-root/
  AGENTS.md              # Global project rules
  packages/
    api/
      AGENTS.md          # API-specific rules (inherits from root)
    web/
      AGENTS.md          # Frontend-specific rules (inherits from root)
    shared/
      AGENTS.md          # Shared library rules (inherits from root)

Child AGENTS.md files inherit from parent files and can override specific sections. This is particularly useful in monorepos where different packages have different conventions.

7. Codex CLI - Open Source Terminal Agent

While the cloud-based Codex agent lives inside ChatGPT, Codex CLI is its open-source counterpart - a terminal-based coding agent that runs locally on your machine. It has become one of the most popular developer tools on GitHub, with 72,000+ stars and a thriving contributor community.

Key Facts

Attribute Detail
Repository github.com/openai/codex
Language Rust
License Apache 2.0
GitHub Stars 72,000+
First Release April 2025
Current Version 1.x (stable)

Installation

# macOS / Linux
brew install openai/tap/codex

# Or via cargo (Rust toolchain required)
cargo install codex-cli

# Or download pre-built binary
curl -fsSL https://cli.codex.openai.com/install.sh | sh

# Verify installation
codex --version

How It Differs from Cloud Codex

Codex CLI and cloud Codex share the same underlying models but differ in execution:

Feature Cloud Codex (ChatGPT) Codex CLI (Terminal)
Execution Remote cloud sandbox Local machine
File access Cloned repo in sandbox Direct filesystem access
Internet Configurable per task Uses your local network
Parallel tasks Up to 8 1 (sequential)
PR creation Built-in GitHub integration Via git commands
Cost Included in ChatGPT plan API token usage (pay-per-token)
Approval modes Automatic suggest / auto-edit / full-auto

Approval Modes

Codex CLI has three approval modes that control how much autonomy the agent has:

# Suggest mode (default) - agent proposes changes, you approve each one
codex "add input validation to the user registration endpoint"

# Auto-edit mode - agent can edit files but asks before running commands
codex --auto-edit "refactor the database layer to use connection pooling"

# Full-auto mode - agent has full autonomy (use with caution)
codex --full-auto "fix all ESLint errors in the project"

Why Rust?

OpenAI chose Rust for Codex CLI for several practical reasons:

  • Startup time: The CLI launches in under 50ms, compared to 500ms+ for Node.js-based alternatives
  • Memory efficiency: Handles large codebases without excessive memory usage
  • Single binary: No runtime dependencies - download one binary and it works
  • Cross-platform: Compiles natively for macOS (ARM/x86), Linux, and Windows
  • Safety: Rust's ownership model prevents the memory bugs that plague long-running agent processes
Community note: Codex CLI's Apache 2.0 license means you can fork it, modify it, and use it in commercial products. Several companies have built internal tooling on top of the Codex CLI codebase, adding custom tool integrations and enterprise authentication.

8. Competitive Landscape

The AI coding agent market in 2026 is fiercely competitive. Codex is the most widely used, but it faces strong competition from several directions. Here is an honest comparison.

Codex vs. Claude Code (Anthropic)

Claude Code is Codex's most direct competitor - a terminal-based coding agent powered by Claude 3.5 Opus and Claude 4 Sonnet. The comparison is nuanced:

Dimension OpenAI Codex Claude Code
Token efficiency 3x more efficient Higher token consumption per task
Blind code review preference 33% preferred 67% preferred
Execution model Cloud sandbox + local CLI Local terminal only
Parallel tasks Up to 8 1 (sequential)
PR creation Built-in (cloud) Via git commands
Subagents GA (March 2026) GA (February 2026)
Open source CLI only (Apache 2.0) Not open source
IDE integration ChatGPT web + CLI Terminal + VS Code extension

The headline stat is striking: Codex is 3x more token-efficient than Claude Code for equivalent tasks, meaning it costs significantly less per task at API pricing. However, in blind code review studies where developers evaluated the output quality without knowing which tool produced it, Claude Code was preferred 67% of the time. This suggests Claude Code produces more idiomatic, readable, and well-structured code, even if it uses more tokens to get there.

The practical takeaway: Codex excels at high-throughput, parallel task execution and CI/CD integration. Claude Code excels at interactive pair-programming where code quality and developer experience matter most. Many teams use both.

Codex vs. GitHub Copilot

Copilot and Codex are siblings - both from the OpenAI/Microsoft ecosystem - but they serve different roles:

  • Copilot is an IDE-integrated assistant focused on real-time code completion, inline suggestions, and chat within VS Code/JetBrains. It is reactive - you write code, it suggests completions.
  • Codex is an autonomous agent focused on task completion. You describe what you want, and it does the work independently.

They are complementary, not competitive. Many developers use Copilot for moment-to-moment coding and Codex for larger tasks like feature implementation, refactoring, and test generation. GitHub has been integrating Codex capabilities into Copilot Workspace, blurring the line between the two products.

Codex vs. Cursor

Cursor is an AI-native IDE (a VS Code fork) that embeds AI deeply into the editing experience. Its strengths are:

  • Inline editing: Select code, describe a change, and Cursor modifies it in place
  • Multi-model support: Use GPT-5.5, Claude, Gemini, or local models
  • Codebase indexing: Cursor indexes your entire project for context-aware suggestions
  • Composer: Cursor's agent mode for multi-file changes

Cursor's advantage is the tight IDE integration - changes happen in your editor with full undo/redo support. Codex's advantage is the cloud sandbox model, which means tasks run in the background without blocking your editor, and you can run multiple tasks in parallel.

Codex vs. Amazon Q Developer

Amazon Q Developer is AWS's AI coding assistant, deeply integrated with the AWS ecosystem:

  • AWS expertise: Q excels at AWS-specific tasks - CloudFormation, CDK, Lambda, IAM policies
  • Code transformation: Q can migrate Java 8 to Java 17, .NET Framework to .NET Core
  • Security scanning: Built-in vulnerability detection tuned for AWS services
  • IDE integration: VS Code, JetBrains, and the AWS Console

Q is the best choice for AWS-heavy workloads. Codex is more general-purpose and stronger at non-cloud coding tasks. For teams building on AWS, using Q for infrastructure code and Codex for application code is a common pattern.

Market Share (April 2026)

Tool Weekly Active Users Primary Use Case
GitHub Copilot 15M+ IDE code completion
OpenAI Codex 4M+ Autonomous coding agent
Cursor 3M+ AI-native IDE
Claude Code 1.5M+ Terminal coding agent
Amazon Q Developer 1M+ AWS-integrated assistant

9. Codex Security

In Q1 2026, OpenAI launched Codex Security - a specialized mode that uses the Codex agent to scan codebases for security vulnerabilities. This is not a traditional static analysis tool. It uses GPT-5.5's reasoning capabilities to understand code semantics and identify vulnerabilities that pattern-matching tools miss.

How It Works

Codex Security operates in the same cloud sandbox as regular Codex tasks. When you trigger a security scan, the agent:

  1. Clones your repository into an isolated sandbox
  2. Builds a semantic understanding of the codebase - data flows, trust boundaries, authentication paths
  3. Identifies potential vulnerabilities using a combination of pattern matching and reasoning
  4. Verifies each finding by tracing the data flow from source to sink
  5. Generates a report with severity ratings, affected code paths, and suggested fixes
  6. Optionally creates PRs with the fixes applied

The Chromium/OpenSSL/PHP Audit

The most impressive demonstration of Codex Security came when OpenAI ran it against three of the most security-critical open-source projects: Chromium, OpenSSL, and PHP. The results were remarkable:

Project Critical Issues Found High Issues Found False Positive Rate
Chromium 340+ 1,200+ ~12%
OpenSSL 280+ 450+ ~8%
PHP 180+ 600+ ~15%
Total 800+ 2,250+ ~12% avg

Finding 800+ critical issues across these heavily-audited codebases - projects that have been reviewed by thousands of security researchers over decades - demonstrated that AI-powered security scanning can find vulnerabilities that traditional tools and human reviewers miss. The ~12% false positive rate is competitive with commercial SAST tools like Snyk, Semgrep, and SonarQube.

Vulnerability Categories

Codex Security is particularly strong at finding:

  • Memory safety issues: Buffer overflows, use-after-free, double-free (in C/C++ codebases)
  • Injection vulnerabilities: SQL injection, command injection, XSS, template injection
  • Authentication bypasses: Logic errors in auth flows, missing authorization checks
  • Cryptographic weaknesses: Weak algorithms, improper key management, timing attacks
  • Race conditions: TOCTOU bugs, concurrent access without proper locking
  • Supply chain risks: Suspicious dependencies, typosquatting packages, outdated libraries with known CVEs
Integration: Codex Security can run as a GitHub Action on every PR, providing security review alongside your existing CI/CD pipeline. It adds comments directly to the PR with findings and suggested fixes.

10. Limitations and Known Issues

Codex is powerful, but it is not magic. Understanding its limitations is essential for using it effectively and setting appropriate expectations.

Context Window Constraints

Even with GPT-5.5's 256K context window, very large codebases can exceed the agent's ability to hold all relevant context simultaneously. For monorepos with millions of lines of code, Codex uses heuristics to select the most relevant files, which means it can miss cross-module dependencies or subtle interactions between distant parts of the codebase.

Hallucination (Reduced but Not Eliminated)

GPT-5.5 hallucinates significantly less than earlier models, but it still happens. Common hallucination patterns include:

  • Inventing API methods that don't exist in a library (especially for less popular packages)
  • Generating import paths that don't match the actual project structure
  • Assuming configuration options that aren't available in the version you're using
  • Creating test assertions based on assumed behavior rather than actual behavior

The sandbox's ability to run tests catches many of these issues, but not all. Always review Codex's output before merging.

Language and Framework Coverage

Codex performs best with popular languages and frameworks that have extensive training data. Performance degrades for:

  • Niche languages (Elixir, Haskell, OCaml, Zig) - functional but less idiomatic
  • Internal/proprietary frameworks - Codex can't know about your company's custom framework unless AGENTS.md provides detailed guidance
  • Very new libraries - anything released after the model's training cutoff may not be well-represented
  • Domain-specific languages (DSLs) - Terraform HCL and SQL are well-supported, but custom DSLs are hit-or-miss

Sandbox Limitations

  • No GPU access: The sandbox doesn't have GPU support, so ML training tasks or CUDA code can't be tested
  • Limited system services: No Docker-in-Docker, no systemd, no database servers (unless you use the full-access network mode to connect to external services)
  • Timeout: Tasks have a maximum execution time (15 minutes on Free/Plus, 30 minutes on Pro, configurable on Enterprise)
  • Filesystem size: Sandbox storage is limited to 10GB, which can be insufficient for projects with large binary assets

Non-Determinism

Like all LLM-based tools, Codex is non-deterministic. Running the same task twice may produce different code. The code will be functionally equivalent in most cases, but the exact implementation details - variable names, code structure, algorithm choices - can vary. This makes it unsuitable for tasks that require exact reproducibility.

Critical reminder: Codex is a tool, not a replacement for engineering judgment. Always review generated code, especially for security-sensitive paths, data handling, and business logic. The agent is excellent at mechanical tasks but can make subtle logical errors that only a human with domain knowledge would catch.

11. Real-World Adoption

Codex's growth from launch to 4 million+ weekly active users in under a year makes it one of the fastest-adopted developer tools in history. Here is how organizations are using it in practice.

Adoption by the Numbers

Metric Value (April 2026)
Weekly active users 4M+
Tasks completed per day 12M+
PRs created per week 2.5M+
Enterprise customers 500+
Codex CLI daily downloads 50K+
AGENTS.md adoption (top 1K GitHub repos) ~40%

Common Use Patterns

Based on OpenAI's published usage data and community reports, the most common Codex workflows are:

1. Test Generation (28% of tasks)

The single most popular use case. Teams point Codex at untested code and ask it to generate comprehensive test suites. This is particularly valuable for legacy codebases that were built without tests - Codex can read the implementation, understand the expected behavior, and generate tests that serve as both documentation and regression protection.

2. Bug Fixes (22% of tasks)

Developers paste error messages, stack traces, or bug reports and let Codex trace the issue through the codebase. The agent's ability to read multiple files, understand data flow, and verify fixes by running tests makes it highly effective for debugging.

3. Feature Implementation (18% of tasks)

New feature development - adding endpoints, building UI components, implementing business logic. This is where Codex's multi-file editing and test generation capabilities shine.

4. Refactoring (15% of tasks)

Code modernization, dependency upgrades, pattern migrations (e.g., callbacks to async/await, class components to hooks), and structural reorganization.

5. Code Review and Documentation (12% of tasks)

Using Codex to review PRs, explain complex code, generate documentation, and add inline comments to poorly documented codebases.

6. Security Scanning (5% of tasks)

Running Codex Security scans as part of CI/CD pipelines or ad-hoc security audits.

Enterprise Case Studies

Several large organizations have shared their Codex adoption results:

  • A Fortune 500 fintech company reported a 40% reduction in time-to-merge for feature PRs after deploying Codex across their 200-person engineering team. Test coverage increased from 62% to 84% in three months.
  • A mid-size SaaS startup (50 engineers) uses Codex for all test generation and achieved 90%+ coverage across their TypeScript monorepo. They estimate Codex saves each developer 6-8 hours per week.
  • An open-source project maintainer uses Codex to triage and fix issues from community contributors, reducing the average issue resolution time from 12 days to 3 days.

Getting Started

If you are new to Codex, start with the free tier to explore its capabilities. Connect a GitHub repository, try a few test generation tasks, and review the output quality. Once you are comfortable, upgrade to Plus or Pro for higher quotas and parallel task execution. Add an AGENTS.md to your repository for the best results.

12. Frequently Asked Questions

Is OpenAI Codex the same as the old Codex API from 2021?

No. The original Codex was a code-completion API based on GPT-3 that powered early GitHub Copilot. It was deprecated in March 2023. The current Codex (2025+) is a completely different product - an autonomous coding agent powered by GPT-5.5 that runs in a cloud sandbox and can perform complex multi-file engineering tasks.

Does Codex have access to my source code?

Yes, when you connect a GitHub repository, Codex clones it into an isolated cloud sandbox to read and modify files. The sandbox is destroyed after task completion. OpenAI states that code processed by Codex is not used to train models for Business and Enterprise plans. For Free, Plus, and Pro plans, you can opt out of training data usage in your settings.

Can Codex work with private repositories?

Yes. Codex supports private GitHub repositories through OAuth integration. You grant Codex read/write access to specific repositories - it does not require access to your entire GitHub account. GitLab and Bitbucket support is in beta as of April 2026.

How does Codex compare to Claude Code?

Codex is approximately 3x more token-efficient than Claude Code, making it cheaper per task. However, Claude Code is preferred by 67% of developers in blind code review studies for output quality. Codex excels at parallel task execution and CI/CD integration, while Claude Code is favored for interactive pair-programming. Many teams use both tools for different workflows.

What languages does Codex support?

Codex supports all major programming languages. It performs best with Python, TypeScript/JavaScript, Rust, Go, Java, C#, C++, Ruby, PHP, and Swift. It can work with less common languages but may produce less idiomatic code. The AGENTS.md file can provide language-specific guidance to improve output quality.

Can I use Codex CLI without a ChatGPT subscription?

Yes. Codex CLI uses the OpenAI API directly, so you only need an API key with credits. You pay per token used. This is separate from ChatGPT subscription pricing. Many developers prefer this model for predictable, usage-based costs.

Is Codex CLI really open source?

Yes. Codex CLI is fully open source under the Apache 2.0 license. The source code is available on GitHub with 72,000+ stars. You can fork it, modify it, and use it in commercial products. The agent logic, tool integrations, and approval modes are all open. The only proprietary component is the GPT-5.5 model itself, which is accessed via the OpenAI API.

What is AGENTS.md and do I need one?

AGENTS.md is a configuration file that tells AI coding agents how to work with your repository. It includes project conventions, build commands, code style rules, and areas that require human review. While not strictly required, adding one significantly improves Codex's output quality. It is now a Linux Foundation standard supported by all major AI coding tools.

Can Codex deploy my code to production?

Codex can create PRs and trigger CI/CD pipelines, but it does not directly deploy to production. The recommended workflow is: Codex creates a PR, your CI/CD pipeline runs automated checks, a human reviews and approves, and your existing deployment process handles the rest. This keeps a human in the loop for production changes.