OpenAI Codex has had two very different lives. The first was as a code-generation API that quietly powered GitHub Copilot and got millions of developers excited about writing software with natural language. The second life is happening right now, and it is significantly more ambitious. Today, Codex is no longer just a model you call in an API. It is a cloud-based AI coding agent embedded directly in ChatGPT, capable of reading your repository, writing new features, fixing bugs, and running tests, all without you touching the keyboard.
If you searched "what is OpenAI Codex today" expecting the old completions API, this article will catch you up completely.

OpenAI Codex, Then vs. Now
The 2021 Model That Started It All
When OpenAI released Codex in August 2021, it was framed as GPT-3 specifically trained on code. The model was trained on billions of lines of public source code, primarily from GitHub, and could generate working Python, JavaScript, TypeScript, Ruby, and Go from plain English descriptions.
The results were genuinely impressive for 2021. You could type "write a function that sorts a list of dictionaries by a nested property" and get something that actually worked. Developers integrated it through the OpenAI API and built everything from code-completion tools to documentation generators to automated test writers.
The most famous thing built on Codex was GitHub Copilot, the inline code suggestion tool that Microsoft launched in 2022 after acquiring GitHub. For a long time, Codex and Copilot were practically synonymous in the developer community.
Why OpenAI Shut It Down in 2023
In March 2023, OpenAI deprecated the original Codex API endpoints. The reason was straightforward: newer GPT models had overtaken Codex on virtually every benchmark that mattered for code generation. GPT-3.5 Turbo and later GPT-4 were simply better at writing code than a specialized model trained exclusively for it.
This is a pattern that repeats across the AI industry. Specialized fine-tuned models tend to get absorbed into larger general-purpose models over time. The Codex API endpoints were retired and developers were redirected to use the chat completions API with GPT-4 instead.
For about two years, "Codex" as a product name went dormant.
What Codex Actually Is in 2025

A Coding Agent, Not a Model
In May 2025, OpenAI brought the Codex name back with a completely different product. The new Codex is not a model. It is a cloud-based coding agent that runs in its own isolated sandbox, can access your codebase, and takes multi-step actions to complete software engineering tasks.
The core architecture shift is significant. The original Codex was a model you prompted once and received a completion back. The new Codex is an agent that:
- Reads your actual repository files and project structure
- Breaks a task into smaller sub-tasks and works through them sequentially
- Writes code, runs it, checks output, and iterates on failures
- Submits pull requests with its completed work for your review
- Can handle multiple tasks running in parallel, asynchronously
This is not autocomplete. This is closer to a junior developer you assign tickets to and check in with at the end of the day.
How It Sits Inside ChatGPT
The new Codex lives as a feature inside ChatGPT, accessible to Plus and Pro subscribers. You interact with it through a side panel where you assign tasks in plain English. You do not need to be in the same editor or even the same application.
When you assign Codex a task, it spins up a sandboxed environment with a fresh copy of your repository, does its work in that environment, and then either proposes changes or submits them for your review. The sandbox runs in the cloud, which means it is not using your local compute and does not depend on your terminal being open.
💡 Worth noting: The sandbox resets between tasks. Codex does not have persistent memory of previous coding sessions beyond the repository state itself.
The model powering the new Codex agent is based on o3, OpenAI's reasoning model family. This explains why it performs substantially better on multi-step programming tasks than a straight completion model would. The reasoning architecture lets it work through complex chains of logic before committing to an implementation.
What Codex Does in Practice

Writing Code From Plain English
The core use case remains what it always was: you describe what you want and Codex writes it. But the implementation is night-and-day different from 2021.
Instead of completing a single function, Codex can now:
- Implement an entire feature across multiple files simultaneously
- Follow existing code conventions by reading your codebase first
- Add appropriate imports, update type definitions, write accompanying tests
- Commit with a coherent commit message that describes the change accurately
- Flag ambiguities in your request before writing a single line
For example, you might tell Codex: "Add rate limiting to the /api/upload endpoint. Allow 10 requests per minute per user IP, return a 429 status with a Retry-After header when exceeded, and add a test for the rate limit behavior." Codex will read your existing endpoint code, figure out what middleware or library your project already uses, and implement a solution that fits your specific stack rather than a generic template.
This is qualitatively different from getting a code snippet in a chat window. Codex is reading your actual project context before it writes a single character.
Debugging Without Being Asked
One of the more surprising behaviors is proactive error detection. When Codex implements something, it runs your test suite (if configured) and catches failures before surfacing the result to you.
If tests fail, Codex iterates. It does not hand you broken code with an explanation of what to fix. It reads the error output, adjusts its implementation, and runs tests again. This loop can repeat several times before you ever see a result. You see the finished work, not the trial and error.
This is what makes the agent framing accurate. A code completion tool gives you the code. A coding agent takes responsibility for whether the code actually works.
Running Code in an Isolated Sandbox
The sandbox architecture matters for three distinct reasons:
- Safety: Codex cannot accidentally affect your local files. Everything happens in a contained environment that resets after each task.
- Repeatability: The environment is clean and consistent, so results do not vary based on your machine state.
- Concurrency: You can assign multiple tasks at the same time and Codex works on them in parallel, processing a queue of tickets while you focus elsewhere.
💡 Practical tip: Codex works best when your repository has a clear setup script (like a Makefile or package.json scripts) and working tests. The agent uses these to verify its own output. A project with no tests gives Codex nothing to validate against, so its confidence in its own output drops significantly.
Codex vs. GitHub Copilot

The obvious question when OpenAI announced the new Codex was: how is this different from GitHub Copilot? The confusion is understandable. Copilot was built on the original Codex. Both are AI tools that write code. Both are connected to OpenAI through various organizational relationships. But they are genuinely different products at this point, serving different parts of the development workflow.
Where the Two Overlap
- Both can generate code from natural language descriptions
- Both work across multiple programming languages and frameworks
- Both can suggest tests, documentation, and type definitions
- Both have awareness of your existing code context
Where Each One Wins
| Capability | GitHub Copilot | OpenAI Codex (2025) |
|---|
| Real-time inline suggestions | Yes | No |
| IDE integration | Deep (VS Code, JetBrains, etc.) | Limited |
| Multi-file autonomous tasks | Limited | Yes |
| Runs and tests its own code | No | Yes |
| Works without your IDE open | No | Yes |
| Async and parallel tasks | No | Yes |
| Access model | GitHub subscription | ChatGPT Plus or Pro |
The simplest framing: Copilot helps you write code faster while you work. Codex takes a task and completes it while you do something else entirely.
Copilot is a power tool in your hands. Codex is an agent you delegate to.
The Developer Workflow With Codex

Tasks It Handles Without You
There is a specific category of software engineering work that is both genuinely tedious and well-defined: the kind of task where you know exactly what needs to happen, but actually doing it takes an hour of mechanical typing, finding the right files, updating tests, and creating a PR. Codex is well-suited to precisely this category.
Good Codex tasks include:
- Adding new API endpoints with standard CRUD operations where the pattern is already established
- Writing unit tests for existing functions that lack coverage
- Migrating a codebase from one library version to another, including updating deprecated imports
- Fixing a specific, well-described bug where the cause is understood and the fix is clear
- Adding logging or observability instrumentation to existing functions
- Generating TypeScript types from a JSON schema or a set of API response examples
- Updating documentation to reflect recent changes in code behavior
These tasks share a common trait: they are tedious for a human but well-scoped and verifiable with tests. Codex excels here precisely because it can define success criteria (passing tests, correct output) and iterate toward them systematically.
When You Still Need to Drive
Codex is not a replacement for engineering judgment. There are categories of work where handing a task to an agent is the wrong call regardless of how capable that agent is.
Architectural decisions still belong to humans. Should this be a microservice or a monolith? Codex will do what you tell it, but it will not notice that your entire approach is wrong for the problem at hand.
Ambiguous requirements produce inconsistent results. If you cannot precisely describe what done looks like, Codex will produce something, but probably not the right something. The old computing principle applies: garbage in, garbage out, just faster now.
Security-sensitive changes need human review regardless of who wrote the first draft. Authentication flows, permission models, and cryptographic implementations warrant careful inspection even when AI wrote the implementation correctly.
Performance optimization driven by profiling data requires understanding which part of the system is actually slow, something Codex cannot determine without access to runtime instrumentation.
The mental model that works best: treat Codex like a skilled contractor who needs clear specs. The more precisely you define the task, the better the output.

Open-Source and Free Alternatives
The AI coding space is not a two-player game. Several strong alternatives exist, many accessible without a paid subscription to any single platform.
DeepSeek v3 has become a serious option for code generation tasks. The open-weight model benchmarks competitively with GPT-4-class models on coding benchmarks and can be run locally or accessed through platforms like Picasso IA. Its successor, DeepSeek v3.1, pushes performance further and shows particular strength on multi-language refactoring tasks where context across files matters.
Kimi K2 Instruct from Moonshot AI is another strong option, built with agentic coding as a core design priority. It handles long-context code analysis well, which matters when you are asking a model to reason across a large repository with many interdependent files. Kimi K2.6 continues this trajectory with extended agent tool-use capabilities.
IBM's Granite 8B Code Instruct 128K and Granite 20B Code Instruct 8K are purpose-built code models designed for enterprise codebases with permissive licensing suitable for commercial use.
Where GPT-5 and Claude Fit In
If you are already in the ChatGPT ecosystem, OpenAI's newest models handle code with genuine capability. GPT-5 handles complex multi-file reasoning and is the model most developers reach for when a task is too large or nuanced for a more constrained tool. GPT-4.1 remains a practical choice for everyday coding tasks at a lower cost point.
On the Anthropic side, Claude 4 Sonnet has earned a strong reputation among developers for producing clean, well-structured code. Many specifically prefer it for refactoring tasks because it tends to preserve original intent while improving readability and structure. Claude 4.5 Sonnet extends this with stronger agentic capabilities, making it a compelling alternative for teams that want to integrate through the API rather than use ChatGPT directly.
For reasoning-heavy tasks, o4-mini and DeepSeek R1 are the standouts. These models think through a problem before committing to an answer, which meaningfully reduces errors in complex algorithmic or debugging tasks where the right approach is not immediately obvious.
💡 For teams building AI-powered applications: all of these models are accessible through Picasso IA's LLM collection, which means you can test them in a single interface without setting up separate API accounts for each provider.
What This Means for Software Development

The Shift Already Happening
The original Codex model made individual developers faster. The new Codex is doing something structurally different: it is separating the act of writing code from the decision of what code to write.
When a developer assigns a task to Codex and reviews the output rather than writing every line, their role changes. They spend more time on specification, review, and architectural judgment. They spend less time on mechanical implementation. This is not a small change. It affects how teams scope work, how they estimate effort, and what skills create the most value.
The developers who benefit most from this shift are those who can operate as effective technical reviewers: asking the right questions about edge cases, catching subtle errors in logic, and knowing when to push back on a proposed implementation that technically works but creates long-term problems.
Speed and the New Bottleneck
One thing the new Codex changes decisively: the bottleneck in software development is shifting from writing to reviewing. When code generation is fast, the constraint becomes your ability to evaluate whether generated code is correct, maintainable, and secure.
This puts a premium on reading comprehension, testing discipline, and code review skills. It also creates new collaboration patterns. Instead of two developers writing code together, you might have one developer plus an agent writing code while the developer reviews and steers direction. Teams are still figuring out how to make this work well across different project types and team compositions.

Some organizations are already reporting that well-scoped tasks that previously took a full developer day are being completed in under an hour with agent-assisted workflows. That improvement is not uniformly distributed. It depends heavily on codebase quality, test coverage, and how precisely the task is specified. But the directional trend is consistent enough to take seriously.
What does not disappear in this model: the need for people who understand software at a systems level. Codex can write a function, but it cannot tell you that your data model is fundamentally wrong, or that the feature you just asked it to build will create a race condition at scale. Judgment at that level still lives with humans.
Start Creating With AI Right Now

Codex is one piece of a much broader picture. The same wave of AI capability that turned a code completion API into an autonomous coding agent has also reshaped image generation, video production, voice synthesis, and creative work of every kind.
If you want to experiment with the AI models defining 2025 without managing a dozen separate accounts, Picasso IA brings them together in one place. From large language models like GPT-5 and Claude 4 Sonnet to image generators with over 91 models to choose from, to text-to-video tools, super-resolution, AI music generation, and lipsync, the platform lets you move between capabilities without switching tools or contexts.
The developers, designers, and creators who are getting the most from this moment are the ones building fluency across multiple AI capabilities, not just those who know how to prompt a coding assistant. Pick a model. Generate something. See what these tools actually do when you push them past the obvious use cases.
Try it at Picasso IA and see what you build.