Riley Brown · Youtube · 26:57

The Latest Codex Updates and The Truth about Opus 4.8

A 27-minute breakdown of why Opus 4.8 barely moved the needle and why Codex platform updates mattered far more.

Posted
May 31st 2026
yesterday
Duration
26:57
Format
Tutorial
educational
Channel
RB
Riley Brown
§ 01 · The Hook

The bait, then the rug-pull.

Anthropic called it the most advanced model in the world. The practitioners who actually tested it called it a camera bump. In the same week, OpenAI quietly shipped half a dozen Codex updates that changed how the host works every single hour — and nobody sent a press release. This breakdown sorts which story mattered.

§ · Chapters

Where the time goes.

00:00 – 02:51

01 · Intro / Opus 4.8 overview

Anthropic announcement, model card benchmarks, 3-hour personal test. Host and cited practitioners cannot distinguish 4.8 from 4.7.

02:52 – 04:55

02 · GPT 5.5 vs Opus 4.8

DeepSWE data: GPT 5.5 scores higher at lower cost and fewer tokens. Trust for long agentic tasks goes to GPT; Opus wins on design.

04:56 – 05:52

03 · Model updates vs super-app updates

Framing shift: two categories for lab announcements. Super-app innovation is where the real delta is now.

05:53 – 07:39

04 · Codex: Windows compute use + mobile

@computer-use lands on Windows. QR code pairs ChatGPT on iPhone with desktop Codex session in real time.

07:40 – 10:37

05 · Codex browser upgrade

Persistent login across sessions, multi-tab via cmd+open. Demo: Twitter and Notion without re-auth. Host's most-used new feature.

10:38 – 12:40

06 · Codex spinning up sub-agents

One super prompt spawns 6 parallel chat sessions. AI auto-names and self-prompts each thread.

12:41 – 13:48

07 · Other Codex updates

Cmd+G full-text search across all agent chats. GitHub-style activity streak (43 days, 4B tokens).

13:49 – 18:15

08 · People leaving Replit and Lovable

Single Codex prompt with Neon + Vercel + AI Gateway replicates Replit's full value prop. BYOT/BYOA plugin prediction.

18:16 – 25:41

09 · Agent mini apps

Agents generate ephemeral UI panels that inherit plugin auth, handling the final 10% human decisions directly. Tinder-for-email demo. Teases chorus.com.

25:42 – 26:57

10 · Outro

Moved company SF to NYC. Series rebrands to AI Native. Producer vs. consumer manifesto.

§ · Storyboard

Visual structure at a glance.

open: Opus 4.8 announcement tweet
model vs super-app two-category framework
Codex browser: Twitter signed in
sub-agent spawning: 6 threads created
mini app concept: browser vs mini app panel
tinder-for-email whiteboard
chorus.com mini app reveal
§ · Frameworks

Named ideas worth stealing.

04:56 model

Two Categories of Lab Announcements

  1. Model updates
  2. Super-app updates

Host's lens for deciding how much attention to give any AI lab announcement — model increments vs. platform/UX changes that affect daily workflow.

Steal for Any content creator covering AI news who wants a consistent editorial frame
18:16 concept

Agent Mini App Architecture

Generative UI panels spawned by an agent inside its workspace, inheriting the user's plugin authentication, allowing the human to make final-10% decisions without leaving the agent environment.

Steal for Product designers or developers thinking about what AI-native applications should look like
17:18 concept

BYOT / BYOA Product Model

Bring Your Own Tokens + Bring Your Own Agent: a SaaS pricing model where the platform charges only for interface/hosting, not AI compute, giving users model choice and reducing operational costs.

Steal for Founders considering a vibe-coding or AI tools product competing with Replit/Lovable
§ · Quotables

Lines you could clip.

01:34
"I literally couldn't tell the difference between the two models."
Punchy, standalone, directly contrasts the Anthropic marketing headline → TikTok hook
02:01
"We are entering the era where model releases start to feel like iPhone releases. Remember when every new iPhone had a genuine leap? Now it's a slightly better camera and you can't really tell the difference."
Vivid analogy, immediately understandable, high shareability → IG reel cold open
24:15
"Why would I want to use someone else's external platform if my AI agent can generate a UI for me right when I need it."
Clean one-liner thesis for the mini-app concept, no setup needed → newsletter pull-quote
26:02
"You need to become agent native or agents will just start to use you."
Tight manifesto line with a reversal, producer/consumer framing in one sentence → TikTok hook
§ · Resources Mentioned

Things they pointed at.

01:44channelGreg Eisenberg (AI commentator)
02:44channelMatt Wolf (AI commentator)
02:58toolDeepSWE benchmarks
13:00toolVercel AI Gateway
19:10toolProof (agent-native document editor by Dan Shipper)
24:04productchorus.com ↗
§ · CTA Breakdown

How they asked for the click.

24:04 product
"you can actually already use our product. It's chorus.com, and you can create an AI agent and add like, an agent like Claude Code or Codex directly inside iMessage."

Soft product mention embedded naturally inside the conceptual section rather than a hard sell. Subscribe CTA only in the final seconds.

§ 04 · The Script

Word for word.

HOOK opening / re-engagementCTA the pitch metaphor analogy
00:00HOOKThis week, Anthropic released Opus 4.8, which they say is the most advanced AI model in the world. However, others are saying we've entered the iPhone era of AI models where you can't even tell the difference between each model upgrade.
00:14HOOKWe're gonna discuss this today. We're also gonna talk about codex. This week, OpenAI released some insane updates to their super app, codex,
00:23HOOKand some of the updates they didn't even publicly announce. You're watching AI native where we cover the most important news and updates on the best AI agent platforms and models. My name is Riley Brown.
00:36HOOKLet's not waste any more time. Let's dive in. So here we are.
00:40HOOKThis was the Thursday announcement by Anthropic introducing Claude Opus 4.8. It builds on Opus 4.7 with a sharper judgment, more honesty about its own progress,
00:51and the ability to work independently for longer than its predecessors. And here is the model card. So on the model card, Opus is apparently better at coding.
01:02This agentic coding SWE bench pro. It's not as good as GPT 5.5 at terminal coding, but it's better than all of the other models including Opus four point seven and five point five at reasoning,
01:16controlling your computer, doing knowledge work like doc sheets and presentations, and other finance tasks. And guys, it was genuinely my plan to make a full video on Opus 4.8,
01:28but I spent three hours comparing the difference between Opus 4.8 and Opus 4.7, their previous model that they released.
01:35And guess what? I literally couldn't tell the difference between the two models. And I'm not the only one who thinks this.
01:42Uh, Greg Eisenberg, friend of the show, he said, I didn't cover Claude Opus 4.8 on my pod because I don't think it's meaningfully better than GPT 5.5. And I'll add that it's not meaningfully better than 4.7 either.
01:56And he goes, we are entering the era where model releases start to feel like iPhone releases. Remember when every new iPhone had a genuine leap? Now it's a slightly better camera and you can't really tell the difference.
02:08That's where models are heading. 4.6 to 4.7 to 4.8.
02:13Each one is slightly different, but you can't really tell which one is best. In fact, I'll tell you from personal experience, I'm still running AI agents in iMessage, running an AI agent very similar to OpenClaw, and I'm using Opus 4.6. I think it is the best for general agent work at least based on how I use it.
02:30And I literally can't tell the difference between these three models. And it wasn't just Greg. Here's Matt Wolf agreeing with him.
02:36He said, so much this. I spent over one minute talking about OPUS 4.8 in my recent news breakdown and there really wasn't much to say honestly. And when there's a big update, Matt will spend five, sometimes ten minutes talking about a huge update and he only talked about it for one minute.
02:52And now we're gonna compare GPT 5.5 to Opus 4.8. And DeepSwee,
02:58which is a company that measures frontier coding agents on original long horizon software engineering tasks, they posted some data that was really interesting. And so DeepSwee looks at three things.
03:11Right? They look at cost, time, and output tokens, and then additionally, they plot it against their score.
03:18So you can see here, these are the GPT models right here, and here's the OPIS models right here. And so the higher up you are on this chart, the better your score.
03:27OpenAI got a better score. And you notice here that the cost goes this way. So the further this direction you are, the more expensive your model is.
03:38So GPT 5.5 medium high and extra high are scoring higher for less cost than Anthropix
03:47Opus 4.8. The OpenAI is getting a better score for a lower cost. Here we can see they're getting a better score
03:55at a lower amount of tokens per task which is better and we're also seeing that the average cost per task is just lower. Right? If you see that this model is clearly the most efficient,
04:09it takes less time and it gets a higher score. And also as of late, I've also noticed a lot of people talking about trust and depth of tasks. This guy said I can trust GPT 5.5 with things I would never trust Opus 4.8 to handle.
04:23Yeah. Opus 4.8 feels good and can be quite addictive to use especially when vibing, but that's mostly surface level.
04:30I'll also add that the Opus models in general are better at design. They're better at presentations. You're gonna get a better slide deck, a better landing page.
04:38It looks more appealing. They put a lot of effort into claw design. However, when you wanna do really long agentic tasks, if you wanna do deep coding work or have it control your computer and even control your text messaging directly
04:51HOOKfrom the app, I highly recommend using GPT 5.5. And so now I divide these large labs announcements
04:59HOOKinto two categories. Right? There's model updates and then there's super app updates.
05:04HOOKAnd like nine months ago, I was way more excited for model updates because every single model update felt like a big step change and everything was done in the terminal. So there wasn't really that much innovation happening at the app level or you know the app where you use these AI agent tools.
05:21HOOKAnd if you've been watching my content for the last four months, I've been obsessed with the super app and the super app like Claude desktop or Codex are these apps where you can very easily talk to AI agents.
05:34HOOKRight? You can speak to AI agents where you have your tasks on the left panel. You have your agent and then whatever your agent is working on.
05:42HOOKAnd there's so much innovation that needs to be done to make this a very seamless process so that you can interact with agents for all of your work. And this is exactly what OpenAI did this week. They announced a bunch of different things for their codex application,
05:58new updates to their platform. And so the first update that they announced is there is now Windows computer use. So if you go to the Codex app on Windows,
06:09you can now officially type at, uh, computer use and you can have GPT 5.5
06:16inside Codex. It can control your computer fully. You can say control Canva
06:22to do task and you can do this on Windows now.
06:26Another one for those Windows users out there, they now have Windows Codex remote. Inside Codex, if you go down to this phone icon right here, this will give you a QR code. If you have ChatGPT downloaded on your phone, you can now type prompts directly through ChatGPT and it will control codecs which can control your computer.
06:48If you have an iPhone and a Windows computer, you can connect ChatGPT. Right? This is just the ChatGPT app and I'm going to the codec section and now I can press chat
06:58and now I can message codecs and I can even use computer use inside the iPhone app and I can say please, uh, check my, uh, desktop and tell me what's there. And you can see here, right, it's showing up right here.
07:15You can do this on Mac and now you can even do this on Windows and these are perfectly synced. You can see here this is the same exact chat thread and it shows up on the desktop app and the phone. You can literally control codecs from your phone, Mac or Windows.
07:31And since codecs can control your computer, you can basically control your computer through ChatGPT, which is a really, really underrated and cool feature. Feature.
07:40Okay. The second set of updates that OpenAI released for Codex, this is the one that I'm gonna use the most and I think it is just the most useful. And so when you're inside Codex when you're inside Codex,
07:54you can open up a browser. Now as of two days ago, these stay signed in.
08:01So I can go to twitter.com and you notice I'm already signed into my profile. I don't know why these tweets aren't loading.
08:08There we go. This is my Twitter feed and I'm automatically signed in. Can also say something like please get my, uh, latest
08:19video agent native to link on
08:24Notion. Summarize it, and give me a link here.
08:30So since iCodecs is set up to connect to Notion through the Notion plugin, it can find the exact video I'm talking about. It's gonna give me a link to that video then I can just open it directly inside the Codex browser.
08:44This is becoming a full browser and take a look at that. So it responded. It thought for two minutes it found the Notion document that I'm working on which is for this video.
08:55And all I need to do is right click on this and click open in browser and take a look at this. We are automatically signed in to Notion.
09:04Close the sidebar and here's the app open inside Codex and I'm signed into Notion so any document that it creates inside Notion for me I can just open it up.
09:15So now I'm using Codex. I can ask Codex to change anything inside Notion and it will edit the page and I can see it live.
09:23I can add things to it just like I'm using Notion except I don't need to leave the AI powered super app which is Codex. Now I do anticipate that Claude code or the Claude desktop app will have this feature.
09:38It just feels like they're really far behind and they're not prioritizing it. This right here is something that I've been using every single hour for the past seventy two hours since they released this feature. Now that you stay signed in, it's really useful because before you actually had to sign in every time you opened up a web browser.
09:56And another thing that I realized, you can open up many browser tabs. You can't hit plus and open a browser tab, but if you're in your browser and you press command open, look at this. It's opening all of these as new browser tabs.
10:10So I can go from the main tab to this tab to this tab to this tab.
10:16And so we're starting to see this become a full browser that you can use next to your AI agent. So that is two, which is browser tabs stay signed in when you're using the browser inside Codex.
10:29And we also have multiple browser tabs per task and we're starting to see it become as if you had Google Chrome inside Codex. And so this third one is a lot of people's favorites. So now when you use use Codex,
10:44agents can spin up other agents. On top of this agents, you can ask Codex about
10:52any chat you have open. Let me show you how this works.
10:57So if we go to Codex, I can now type something like this directly inside Codex. So and I call this a super prompt.
11:03I want you to spin up new chat sessions inside Codex. So like right now I'm about to fire off a chat session and this chat session will actually create six more chat sessions. So check this out.
11:15So I'm gonna run this. And so now you can see here it says all set this up as six separate codex threads with concrete task prompts. So it's basically going to write prompts
11:27in new chat sessions and then they'll show up right here. Okay. So it's activating
11:32some memory. It's it's basically trying to figure out how it wants to prompt the agent and so it says I'm creating six background threads now each with narrow brief and completion criteria. And here it goes.
11:44It's created one, two, three, four, five,
11:50and six. We're gonna see AI rename them. Watch this.
11:53So triage, boom, boom, and boom.
11:58So AI created these new chats and you can see here the AI basically prompt this. It's sent by Codex from another thread.
12:07That's how you know Codex prompted it which is really cool. So you can ask Codex to create new threads.
12:14So you can start up 10 threads directly inside Codex and here they are all going to work. And so that's really cool and I haven't even fully discovered all of the use cases that I wanna use for this.
12:25Maybe I might do a full video on that specific feature about using one master agent to spin up sub agents and then you can create an automation which checks in on how those other agent chats went. I think there's a lot of exploration to do there, but that's out of the scope for this video. I do wanna cover some a little other updates that they announced which is there's now better search.
12:45So if we go to codex, if we go to codex, and now if you press command g,
12:53I believe, I can now search way better. Right?
12:56You can press command g and I can search for a key term like OpenAI and everywhere OpenAI is mentioned,
13:03I can now search not just through the titles but through all of the chats in general. Right? So it's much easier to search through all the chats.
13:11Let's see where I mentioned, uh, command g, where I mentioned Chorus. These are all of the scripts or all of the chat sessions where I mentioned Chorus.
13:20It makes it a lot easier to search through all of the agent chats that I create. Another small thing that they announced was this new GitHub activity page. So, again, if we go to codecs and you go to settings,
13:33profile, here we can see all of the days where I use Codex. I basically started using Codex forty three days ago.
13:40I've been using it every day since forty three day streak. My longest task with three hours and seven minutes, and I've used 4,000,000,000 tokens. Pretty fun new update to the app.
13:49Okay. So now I wanna move to another trend that I've noticed. A lot of people have been DMing me about their vibe coding platform that they use, whether it's Lovable, Replit, Bolt, etcetera.
13:58Many people are moving from these dedicated vibe coding platforms to Codex or Cloud Code because, you know, I think we're about one or two months away from these platforms being full vibe coding platforms. And many people who use Replit say that, like, it's just significantly easier to just vibe code an app, get it on the Internet, and use it for internal use or sell it as a SaaS.
14:19Many people love these vibe coding platforms because it makes everything easy. Because after all, Codex just generates the code and then it lets you see your app in the browser. Whereas something like Replit generates the code, it makes viewing the app visible while you're building.
14:34Right? Just like the in app browser inside codex. It also sets up authentication.
14:38It sets up database and it also does one other thing which is like it has like some security things but mostly that's just an AI prompt and then it also hosts the app on the internet. Well, what people are realizing now is that all of these are just like a single prompt inside codex.
14:55Right? On codex, can run a prompt like this. You can say please build an internal tool for my company to track whatever it is that you wanna track.
15:01For this example, I'm just using video stats. And you could say make this web app. Use Neon Postgres which is a database service for database.
15:08As long as you have an account on Neon and you set up the plugin, this just works one shot. And then you can say use Google for sign in, um, and for off. And then you could say use Vercel for hosting.
15:18Right? And this, uh, puts the app on the Internet. And then you could say use AI gateway for AI features.
15:24So this is another Vercel app where all you need is to sign in to Vercel, get one single API key and once you set that up you can use any AI model. You can also use something called Genmedia which is all of the image and video models and this is by FAL.
15:40And so I've already set this up and made this skill so I can build any app with any AI feature or AI video model directly inside the app and then I can just say like make sure to run many security checks. GPT 5.5 extra high is incredible for checking for vulnerabilities.
15:58So you can just fire off this whole entire prompt and this basically solves for the entire value prop of tools like Replit and Lovable.
16:09And soon, I believe there's going to be someone who builds a fully AI native AI native version
16:17of Replit and Lovable. And this is a product that our team and I, we considered building this tool, um, but we just kind of we fell out of love with building static apps.
16:27Agents are just way more fun to work with. But someone could very easily build an AI native Replit and Lovable which acts as a plugin. And so you could create
16:37a skill which handles all of this stuff right here for the user and build it directly inside Codec. So that's one of my big predictions for the rest of 2026. Someone's going to build a replet and lovable that makes it as easy it is to use lovable but inside Codex.
16:54Because with replet and lovable, you use their tokens and you use their
17:01agent. And so the replet agent is actually worse than just using codecs out of the box and it's more expensive because OpenAI heavily subsidizes
17:10users to use GPT 5.5 directly in the app. And so someone could build an AI native version of Replent and Lovable where it's just BYOT
17:20and BYOA, which is bring your own tokens and bring your own agent. You So can imagine a world
17:28where I go to Codex and I could say build an app and use, uh, at use at, uh, Lava Plit. And this is my fictional app that someone could build where it just handles all of that except it acts as a plug in and you use it directly inside Codex and maybe it only cost $10 a month because this company that get that creates it doesn't have to build an agent.
17:51They don't have to pay for tokens so it's a bigger margin and they just host the user's web app somewhere and maybe that could cost a little bit more money. But I genuinely believe that many people who love to vibe code are just gonna end up switching over to Codex and Claude desktop app over time as they become full
18:11platforms and vibe coding will just be a skill that any AI agent can do. To conclude today's video, I wanna talk about just my biggest obsession for the past two months and it has to do with something called an agent mini app and it stems
18:25kind of from the in app browser inside Codex and eventually all agent platforms. Okay. So in my previous video, I covered a topic
18:36called an agent native app and I used the example of Dan Shipper who created this app called Proof. And Proof is this document editor that's open source that he made to be an agent native app or an app that you use with your agent. So you could say, hi agent.
18:53I wanna create a document. And the agent can create the document and then you can edit the document yourself. You can have the agent edit the document, and he basically, he made the connection between the document and the agent incredibly easy.
19:05It's very seamless to create a document with this agentic application. And I've been fascinated by this because we're gonna have agents that will have browsers connected and so many people are gonna make a ton of money building apps that are just agent native.
19:20They're not meant to be you for you to go to the app and type a document on their platform. It's made for you to ask your agent to create a document and it just uses this technology and renders it right here. So this is really really cool
19:33and really interesting and it's possible right now to create and use these agent native apps. In fact, Google Docs now because your agent can fully control Google Docs, it can fully control Notion. This is an example of an AI native app.
19:48Right? It is an app that's meant to be used by humans but they added like an agent native feature.
19:53Right? This is just an AI agent native feature
19:57of like an app that's meant to be used by going to the platform. So this is all possible, but there's one thing that's not possible.
20:04So on Codex, they have these things called the plugins. But within the plugins, right, you can actually sign in to all of your apps.
20:12And so I have like 30 different plugins like Gmail, like Slack, like, uh, TypeFully,
20:19which, uh, allows me to schedule Twitter posts for the future which I use a lot for our company account. Um, and you know, the list goes on. GitHub,
20:28uh, Vercel, etcetera. All of these different tools.
20:32What is not possible right now inside Codecs that I wish was possible, you cannot create
20:40an AI native app that connects to these specific integrations. Right? When I go to plugins and I sign into my Gmail,
20:49I'm authenticating. Right? I'm authenticating
20:52to my email. What I can't do inside Codex is use this authentication
20:58to create an app that connects to Gmail. Let me explain what I mean by that. So if you think of the way we were describing vibe coding earlier where you have your different agent task, you're chatting with your agent and you can get it to create basically any app you want
21:13and I'm able to add Neon's, uh, database to it by at mentioning Neon. Right?
21:19This is just a database provider and then it can create an app that has a built in database created by Neon. But what if what if your agent
21:29could generate apps here on the side which I call a mini app which could actually integrate with all of your plugins.
21:39And so you could generate a email mini app or you wouldn't even need to consciously generate an email mini app. Your agent would generate it for you.
21:48So imagine you're using Codex and you say something like, I need to do
21:54my email help. And the agent one thing the agent could do is just send you a bunch of drafts to all of your emails.
22:02Right? It can go through and look through your email. It could come up with drafts to send, but it's really hard to like give you that information in a way where you could easily edit those drafts.
22:12What if it created a mini app and the mini app was like a Tinder for, uh, email?
22:21And so it had like it had like a nice input message which is like the person who sent you the message and then it had just like your response. So like it put your response below it and then you could either, um, archive, right, if you don't actually wanna send it send the email or you can just send it as is and since the agent has context over all of your different tools, it'll be really good at understanding your goals and everything.
22:46It'll actually be able to draft a really good email or there would be like an edit button. Let's say you just wanna edit like a few parts of it. You could very quickly edit it and within the app,
22:56you could just press send. So imagine it created an app that you could easily press send. And as you use these apps, right, as you use these mini apps, you would actually learn.
23:08Right? Because every time you press archive, this data would be stored somewhere. I'm not sure how this would technically work but this would be stored somewhere and over time the agent would actually not make suggestions for the types of emails that you would normally archive
23:21and it would learn from every single message that you send. It would learn from all the edits that you make so that every time it suggests an email, it's one that you will very likely send at a very high confidence. So these can be thought of as just like generative UIs that connect with your integrations because right now
23:40you could ask it do this but then you'd have to go back to your agent and say send the first one, don't send the second one, send the third one, make an edit to the fourth one, please say this. What if the agent could just send you the best possible interface that connect with the tools that allowed you to just make the final 10% edits and send it directly in this little mini app?
23:59CTAAnd users would actually be able to create their own interfaces. Right? And you could create your own mini apps and maybe even share them with your team because every person's unique,
24:09CTAevery company's unique, and maybe you want to create your own little mini apps that are integrated with all of the things that you've already signed in with. Why would I want to use someone else's external platform if my AI agent can generate a UI for me right when I need it.
24:26CTAAnd I think this is next, you know, and this is just something that like we've been playing around with and my company in New York, we are I moved my company to New York and we're actually trying to figure this out through iMessage. I'm not gonna go into detail because I'm gonna be doing like a big announcement soon,
24:42but you can actually already use our product. It's chorus.com, uh, and you can create an AI agent and add like, uh, an agent like Claude Code or Codex directly inside iMessage.
24:51And we're trying to figure out how the agent can send you a little link which turns into a mini app. And these mini apps will kind of act as like the operating system for the agent. I genuinely believe that all of the major platforms are gonna kind of circle around this idea, and this what's gonna bring out Jarvis.
25:08Right? How can the AI agent give you the best possible interface for any given task that you can use and and the app actually connects to the integration? You can actually send an email.
25:18You can actually post the social media post. You can actually send the Slack message. Right?
25:23It can suggest things for you and you can properly edit them directly in the interface and I think Codex has a perfect browser for this. The problem is if you try to do this, you actually can't
25:35connect your plugins to the apps that you create. It's just not possible with the way that they built codecs. Anyway, that's it for the update today.
25:44Yes. So I'm here in my Airbnb in New York City. We just moved our company from SF to New York.
25:50It's great energy out here, but unfortunately, I don't have a studio. So we're gonna rebuild our office, rebuild our studio,
25:58and I'm going to be 10 x ing my content effort. My main goal is just to educate people so that you become agent native, uh, which is the new name of this series. I think people need to become agent native or agents will just start to use you.
26:12You could think of social media. Right? If you look at the social media trend over the last ten years, right, there's content creators,
26:20right, who kind of take advantage of social media. And then there's just like the content consumers who kind of get taken advantage of by the algorithm. It addicts you to the platform.
26:29It sells you ads. And so like there's kind of this like you're either a producer or a consumer. I would much rather be on the producer side of this AI revolution.
26:38CTAI think it's really important to learn the different concepts. Um, you should learn the surfaces that these AI agents will exist on, which is why I started this series.
26:47CTASo every week, I cover the most important agent news, and I'm I'm loving it right now. And I'll continue to do it every single week.
26:53CTASo thank you guys for watching. I'll see you here for the next video.
— full transcript
§ 05 · For Joe

Model hype vs. platform reality in the agent era.

WHAT TO LEARN

When practitioners who build with these tools daily cannot distinguish one model generation from the next, the benchmark press releases stop being the signal — the platform changes are.

  • Benchmark improvements on model cards do not automatically translate to detectable differences in real agentic workflows — test your specific use case before upgrading.
  • GPT 5.5 outperforms Opus 4.8 on long-horizon coding and deep agentic tasks by the metrics that matter to builders: score per dollar and score per token.
  • Anthropic models retain a real advantage in design-heavy outputs — presentations, landing pages, visual documents — where aesthetic judgment matters more than raw task completion.
  • Persistent authentication in an AI browser changes daily workflow more than a 5-point benchmark improvement; the quality of the integration layer is becoming the differentiator.
  • A single well-crafted agent prompt with the right plugin stack (database, hosting, auth, AI gateway) can replicate the full value proposition of purpose-built vibe-coding platforms.
  • The economics of a BYOT/BYOA product are structurally stronger than a bundled AI platform: no agent compute costs, no token subsidies required, higher margin on the interface layer alone.
  • The unsolved problem at the frontier of agent UX is not conversation quality but authentication passthrough — getting generated apps to inherit the user's existing plugin credentials.
  • Generative UI (an agent that creates the right interface for the task at hand) is a more useful frame for the next wave of AI-native products than 'better chat' or 'more autonomous agents.'
  • Every human decision made inside an agent-generated interface is a labeled training signal; the apps that capture those micro-decisions will compound into personalization that static SaaS cannot match.
  • The producer/consumer split from social media is repeating in AI: the people who understand the surfaces agents live on will build leverage; the rest will be optimized against by systems they do not control.
§ 06 · Frame Gallery

Visual moments.