The bait, then the rug-pull.
For months, Claude Code was the only coding agent worth talking about. Then OpenAI shipped Codex -- and the comparison videos started. This one actually runs the tests.
Where the time goes.
01 · Hook and thesis
OpenAI comeback framing, promise of honest head-to-head across features, price, and three specific use cases.
02 · Claude Code overview
Task delegation, file editing, customization via hooks/skills/sub-agents. Desktop, terminal, web versions. Opus/Sonnet/Haiku models.
03 · Codex overview
GPT family models, gpt-codex-spark in preview. WorkTrees as the defining architectural choice. Included in every ChatGPT paid plan.
04 · Shared features
Both tools: local code editing, desktop app, VS Code extension, CLI, MCP, skills format, plugin marketplace, cloud delegation, hooks, sub-agents.
05 · Claude Code advantages
30 hook events vs 6. Auto-delegating sub-agents. /ultra-plan, /ultra-review, /loop. Channels integration. Agent SDK. Enterprise auth (Bedrock, Vertex, Foundry).
06 · Codex advantages
Native WorkTrees per thread. In-app browser. Computer-use QA. at-Codex GitHub PR integration. /goal. GPT image generation. OpenClaw/Hermes compatibility.
07 · Pricing and context windows
Claude: Pro $20, Max 5x $100, Max 20x $200. Codex: included in ChatGPT free through Pro $200. 1M token context (Claude) vs 256K (Codex).
08 · Live benchmark intro and results
Three identical prompts: research report PDF, landing page (Glaido), marketing analytics dashboard. Claude wins landing page and dashboard design; Codex wins PDF efficiency.
09 · Benchmark metrics deep-dive
Raw numbers from JSONL logs. Codex: 25:52, 6.19M tokens, $7.11. Claude: 14:51, 5.8M tokens, $11.05. Output tokens always higher for Claude. Efficiency scatter plot.
10 · Analysis and decision framework
Use Claude for front-end, deep planning, custom workflows, enterprise auth. Use Codex for research tasks, structured documents, /goal, GitHub PRs, image generation. Split workflow is valid.
11 · Portability and closing
Projects are files in folders -- not locked to either tool. CLAUDE.md becomes AGENTS.md. Closing thesis: which tool is best for this specific task.
Visual structure at a glance.
Named ideas worth stealing.
Task-Fit Decision Matrix
- Claude Code: complex front-end, visual design, deep planning, auto-delegation, hooks/skills/channels, Agent SDK, enterprise auth
- Codex: research-heavy tasks, structured PDFs/reports, WorkTree-native shipping, /goal for long-running work, GitHub PR integration, image generation
A task-type decision rule rather than a blanket preference for one tool.
Output Token Efficiency as Session-Longevity Proxy
Output tokens cost more and burn session limits faster. Codex writes 2-5x fewer output tokens than Claude per equivalent task. This explains why Claude users report hitting limits faster -- and it is measurable from JSONL logs.
Lines you could clip.
"It is not a matter of which tool is best, it is a matter of which tool is best for the specific use case in front of you."
"ClaudeCode right now has 30 different hook events. Codex right now has about six. If you want to fire automated behavior into every part of the workflow, ClaudeCode gives you about five x the granularity."
"Claude has this way of planning the task tightly before it executes. And Codex tends to just grind through more iterations, which is why the input tokens stack up on its side."
How they asked for the click.
"I broke all of this down into a resource guide that you can access for completely free, and you can find that in my free school community."
Verbal mention only, no overlay shown. Low-friction -- no product pitch, just a free community link.
Word for word.
Which coding agent to reach for, and when.
The benchmark data splits cleanly: Claude Code wins on front-end quality and planning depth; Codex wins on token efficiency and research-heavy output -- and both tools are portable enough that you do not have to commit to just one.
- Output tokens are priced higher than input tokens, and Claude Code consistently writes 2-5x more output tokens per task than Codex -- which is the direct cause of hitting Claude session limits faster, not a platform throttle.
- Claude Code finished a marketing analytics dashboard in under 2 minutes using 283K tokens; Codex took 8 minutes and burned 1.64M tokens on the same prompt -- a 4x speed gap and 6x token gap for front-end work.
- Codex won the research report task, finishing slightly faster and using 1.9M fewer tokens than Claude, which suggests Codex is more efficient when the task is document generation rather than UI construction.
- Claude Code has 30 hook events for automated workflow triggers; Codex has about 6 -- if you need fine-grained automation that fires on specific agent behaviors, Claude Code is the only current option at that scale.
- Claude Code auto-spawns sub-agents when task complexity warrants it; Codex only does so when explicitly asked -- which means complex multi-step tasks route differently through each tool even on identical prompts.
- Projects built in either tool are portable: skills, hooks, and JSONL logs all transfer; the main swap is renaming CLAUDE.md to AGENTS.md when moving a project into Codex.
- A practical split workflow -- use Claude Code for planning and brainstorming, then hand the plan to Codex for execution -- is validated by how each tool token behavior maps to planning-heavy vs execution-heavy phases.



































































