Nate Herk | AI Automation · Youtube · 13:44

Opus 4.8 Just Dropped. Here's How To Actually Use It.

A 13-minute same-day breakdown of Claude Opus 4.8 — what changed, why effort level is now the primary control surface, and five prompting adjustments direct from Anthropic docs.

Posted

May 28th 2026

today

Duration

13:44

Format

Tutorial

educational

Channel

NH

Nate Herk | AI Automation

§ 01 · The Hook

The bait, then the rug-pull.

Benchmarks always look amazing on launch day. The real question is whether the model actually fixes the things that were breaking your workflow — and for Opus 4.8, the answer turns out to depend almost entirely on a slider you probably never touched.

§ · Chapters

Where the time goes.

00:00 – 00:35

01 · Intro

Promise: same-day breakdown of benchmarks, 4.7 pain points, and key takeaways.

00:35 – 01:07

02 · What's New in 4.8

Blog post walkthrough: effort control, dynamic workflows, same pricing as 4.7, API rate limit increases.

01:07 – 02:05

03 · Effort Levels and Workflows

Live demo of /effort slider in Claude Code CLI: low, medium, high, xhigh, max, ultracode. Ultracode = xhigh + dynamic workflows.

02:05 – 02:54

04 · Benchmarks Reality Check

Benchmarks always look great at launch. Codex with GPT-5.5 may outperform Opus on computer use despite worse paper numbers.

02:54 – 04:38

05 · The Honesty Upgrade

Opus 4.8 is ~4x less likely to falsely claim task completion. Alignment evaluation data shown. Mythos preview teased.

04:38 – 06:52

06 · 4.7 Pain Points

Community-reported 4.7 problems: lazy/early quitting, safety overreach, token explosion, attitude. Anthropic acknowledged and rolled partial fixes but core complaints persisted until 4.8.

06:52 – 10:33

07 · Key Takeaways

Five adjustments from Anthropic docs: effort is the primary lever, positive instructions, give the why, reasoning-before-tools default, self-calibrated response length.

10:33 – 12:12

08 · Community Reactions

Launch-hour social takes. Positive: one-shotted GPT-5.5, warm and collaborative. Cautious: early bugs, real-world data still thin.

12:12 – 13:44

09 · Final Thoughts

Evaluate 4.8 against your specific 4.7 frustrations, not the benchmarks. Watch for vibe upgrade, self-correction frequency, token efficiency. Plug for free token dashboard.

§ · Storyboard

Visual structure at a glance.

open

hook open 00:00

what's new

promise what's new 00:35

effort slider

value effort slider 01:07

honesty upgrade

value honesty upgrade 02:54

4.7 problems

value 4.7 problems 04:38

key takeaways

value key takeaways 06:52

community

value community 10:33

final thoughts

cta final thoughts 13:28

§ · Frameworks

Named ideas worth stealing.

01:07 list

Effort Tiers

low
medium
high
xhigh
max
ultracode

Six-tier effort parameter controlling Claude Code compute, reasoning depth, and token spend. Ultracode combines xhigh with dynamic workflows.

Steal for Any workflow doc or onboarding guide for Claude Code users

04:38 model

4.7 Problem to 4.8 Fix Map

Laziness → Sustained autonomy
Safety overreach → Warmer vibe
Token burn → Token efficiency
Hallucinated completion → Honesty (4x)
Attitude → Collaborative

Direct mapping of the five most-cited 4.7 community complaints to the explicit 4.8 training improvements.

Steal for Model comparison slide, migration checklist

06:52 list

Five Prompting Adjustments for Opus 4.8

Match effort level to task complexity
Tell it what to do, not what not to do
Give the why behind every instruction
Account for reasoning-before-tools default
Let it self-calibrate response length

Five behavioral shifts derived from Anthropic's own prompting best practices doc, applied specifically to Opus 4.8.

Steal for Internal prompt engineering SOP, onboarding doc for teams switching from 4.7

§ · Quotables

Lines you could clip.

07:23

"The difference between Opus 4.8 on low and Opus 4.8 on extra high is a significant difference, like almost to the point where it feels like a different version."

Concrete, testable claim that reframes effort level as a version upgrade → TikTok hook

12:36

"Benchmarks look great, and they always will. Someone else's use case is not your use case."

Punchy two-sentence takedown of benchmark theater, no setup needed → IG reel cold open

05:38

"There is a big difference here between the model having problems and you using the model wrong. Sometimes it truly is a skills problem."

Breaks the pattern of pure model criticism; holds the audience accountable → newsletter pull-quote

§ · Resources Mentioned

Things they pointed at.

00:00linkAnthropic blog: Introducing Claude Opus 4.8 ↗

00:24linkClaude API Docs — Prompting best practices ↗

13:28toolToken tracker / dashboard (free GitHub repo)

§ · CTA Breakdown

How they asked for the click.

13:28 link

"I will leave that in my free school community linked in the description. Just give Claude Code the GitHub repo, tell it to set it up."

Soft free tool mention at the end. No price, no hard pitch. Feels like a utility recommendation.

§ 04 · The Script

Word for word.

HOOK opening / re-engagementCTA the pitch analogy story

00:00HOOKSo Claude Opus 4.8 is finally here. And as always, the benchmarks look amazing. In a lot of the major categories, 4.8 is better than 4.7 and even better than g b t 5.5 as well.

00:10HOOKBut the question is, is it really a better model? So today I wanna talk about what is new to Claude code because of Opus 4.8. I wanna talk about some of the issues that you guys have been having with 4.7 and some of the struggles and how 4.8 is supposedly going to address those issues.

00:24HOOKI'm gonna go over some key takeaways because it seems like this model is going to behave a little bit differently than 4.7, and you're gonna have to change the way that you work with it a little bit. So let's not waste any time and just get straight into this one.

00:34HOOKOkay. So it is 05/28/2026, and OPUS 4.8 has dropped, and it is apparently built on top of OPUS 4.7 with sharper judgment, more honesty about its own progress, and the ability to work independently for longer than its predecessors.

00:47And important to note, it is priced the exact same as Opus 4.7 on input and output tokens. But what's interesting right here is that they have increased rate limits and clawed code to accommodate the higher token usage of effort levels. So that's rate limits.

00:59That is not your, you know, five hour rolling window or your weekly session limits. Those remain untouched, but rate limits if you're using Cloud Code via API has been increased. Alright.

01:07So this is the blog post. I'll link this in the description, but I'm just gonna go over a few of the key findings here. Okay.

01:12So OPUS 4.8 launches alongside several new features. Users on cloud.a I now have control over the amount of effort Cloud puts into tasks. And in Cloud Code, we have a new feature called dynamic workflows that allows it to tackle very large scale problems.

01:25So I'm not gonna dive into Cloud workflows today, but this is a new feature that I will be making a video about shortly. But now in Cloud Code, obviously, you can see we have Opus 4.8 is here. It defaults to high effort.

01:35You can also, of course, switch the effort. But in here, can type in workflows, and that's how you could start using that dynamic workflow feature. But what I wanna show you guys in here, which is pretty cool, is in the terminal or the CLI version, if you do effort, you can see we have the slider.

01:47Like I said, it's going to default to using high, but you can also do low or medium, and you can come up here x high max or ultra code, which is x high plus workflows. So it's very, very smart over here. But, of course, it's gonna cost more from the token perspective.

02:02And then the more left you scroll down onto this slider, the faster your outputs will be. Of course, we can dig into the actual benchmarks, which I like to look at, but the thing about the benchmarks is every single time you see a new model, the benchmarks are amazing. Right?

02:15It's always better than the other ones, and you've always got these other comparisons. So, obviously, that's what they have to do from a marketing perspective. So it's really important for you to understand

02:23what model is actually the best for your use case. Like, maybe the case is, yeah, OPUS 4.8 really is better at agentic coding than something like Codex with GBT 5.5. But maybe for your very specific use case, Codex is just performing way better even if the explicit benchmarks don't say that it should.

02:38Like right here, for example, I think that Codex with GPT 5.5 is much, much better at AgenTek computer use than OPUS 4.7 and OPUS 4.8, even though these two OPUS models apparently,

02:49objectively, would be better at agentic computer use than Codex. So always take these benchmarks with a grain of salt. Anthropic took a whole section of this blog to call out that one of the most prominent improvements is OPUS four point eight's honesty, which I think is interesting because that's definitely something that I noticed with OPUS 4.7 as we're gonna dig into over here as far as, like, problems that people have reported with OPUS 4.7.

03:09But they took time to call out the honesty here. We train all our models to be honest, to avoid making claims they can't support, like saying, hey. This is gonna take me four hours, and then it takes twenty minutes.

03:18Or saying, hey. I finished. I pushed all 50 into blah blah blah, but I only actually pushed 15.

03:22So if you guys have ever felt that, you're not alone. And apparently, OPUS 4.8 is much better at that. And so they actually have evaluations to test this sort of stuff, which is about misaligned behavior.

03:31And you can see here that in this case, a lower score is actually better. So right here, we've got, like, mythos preview coming in pretty low. Opus 4.8 comes in at almost half of what Opus 4.7 and Sonnet 4.6

03:43come in at. But take a look at this. Users will find Opus 4.8 to be a modest but tangible improvement on its predecessor 4.7.

03:50Obviously, there's still more work to be done. But what they say here is that they plan to release a new class of model with even higher intelligence than Opus, which is Mythos.

03:59You can see a small number of organizations are currently using it for cybersecurity work, but models of this capability require stronger cyber safeguards before they can be generally released to the public because we don't want some random kid in their basement hacking into your bank account. But, anyways, Opus 4.8 is available everywhere today.

04:16So wherever you're using Cloud Code, you should be able to access it. Open up a new terminal, Open up a new extension tab, whatever it is, and you can see right here we have Opus 4.8. And you will notice right away that we still have our 1,000,000 context window with Opus.

04:28I could come in here, and I could type in a just did that twice in a row, a slash model, and we can choose between default Sonnet Sonnet or our Opus 4.8, which is most capable for most work. But, anyways, Opus 4.7 was released April 16.

04:41So basically about a month and a half ago, they're moving really, really quick here. And when Opus 4.7 came out, they added the x high effort level, which now has been dwarfed by Max and the Ultra one, Max and Ultra code. But what's interesting is a lot of people actually weren't happy about this model release because they felt like it was actually worse than OPUS 4.6.

05:01So some of those main problems were it felt lazy. It was just basically giving up on the goal, on the task too early. So, you know, Codex had slash goal, and now a bunch of other different AI tools have slash goal.

05:11Cloud Code has slash goal. And that was kind of like a Band Aid fix to put on top of the model to help it work a little bit longer towards some sort of specified goal. But now that is just a core fundamental piece of the model.

05:21Not exactly the slash goal, but just the idea that it's going to be less lazy and it's going to be better at working for longer time. It was also said to be overly rigid with safety overreach. There was a ton of community feedback

05:33on the token burn and how much more expensive this model seemed to be. And the one that I think is the funniest is saying that it had an attitude, which honestly is true. If you've ever heard it sort of get a little sassy with you or push back on your own ideas, it's good to have that sort of, like, brainstorming thought partner.

05:47But I have noticed that sometimes it did sort of come off, like, very short and almost, like, stubborn. So those were some of the main problems that I felt, but also that the community had felt with four point seven. Now there is a big difference here between the model having problems and also, like, you using the model wrong.

06:03It's not always a model problem. Sometimes it truly is a skills problem, and the answer isn't just, oh, well, 4.7 can't do this. Let me wait for 4.8.

06:10Sometimes it is a user error thing. So I just did wanna call that out as well. So, anyways, 4.8 obviously comes out today, and it was built to fix this stuff.

06:18Right? It was built and said to have more honesty and self correction, more sustained autonomy on long running tasks, a warmer and a more collaborative vibe, and just efficiency and quality of life, meaning better tool calling, better reasoning, better question asking, better token efficiency,

06:33stuff like that. And so what I did is I read through a lot of the stuff that community was was talking about. I tested out Opus 4.8 a little bit.

06:40Obviously, it came out an hour ago, so I haven't been able to deep, deep dive into it yet. But I've tested it out. I also read through this documentation here about the prompting best practices, which is a pretty long article from the Claude API docs, which I will also link in the description if you wanna check it out.

06:53But after reading through all this stuff, there's a few takeaways that I I wrote down and that I wanted to share with you guys. The first one is that effort is the number one lever now. And when I look back at one of these problems right here, like maybe the laziness or the safety overreach,

07:05maybe that was an effort issue. Because, basically, if you were doing something that takes a lot of effort, but you've got the model set to low or, you know, medium or even just high, sometimes you just need more effort.

07:16And on the other side of the spectrum, if you're doing something that's really simple and you have that on high or extra high, then maybe you're also, like, dedicating more resources than you need, and the model is gonna overreason and overengineer. And that's where you're like, okay. This is so easy.

07:28Why can't it do it? It's simple. But maybe you just needed to turn down the effort.

07:32And so it really is a balance here between, you know, Cloud's intelligence and the token spend and the speed and all that kind of stuff you're looking at. But the point I'm trying to make here is if you are one of those people that open up Cloud Code and you just start typing and doing your work and building and you never tweak the model,

07:46start trying. Because the difference between Opus 4.8 on low and Opus 4.8 on extra high is a significant difference, like almost to the point where it feels like a different version, like an Opus 4.9. So it's definitely worth starting to pull on that lever a little bit if you never have.

08:01So the next one I've got is tell it what to do, not what not to do. And really the way I got there is if you go through this documentation, it always shows you good example prompts, right, that you could copy for specific scenarios.

08:13And when I looked through a lot of these example prompts, I realized that it wasn't really saying a lot of what not to do, or, I mean, that's horrible timing because right here it says do not do this. But it always tells more explicitly what to do, and what I thought was cool is it gives background. It gives context.

08:29Almost as if the model is sort of, like, curious, and it's gonna say, hey. You told me not to do x, and z, but but y. And the more that you can contextualize that stuff, the better it's gonna be able to follow those instructions.

08:39And that leads me to the next one right here, which is give the why behind an instruction. So rather than saying don't use em dashes, say to something like, I want this to come off like I'm really writing it. And this is my writing style, and I never use em dashes, so make sure you're following my writing style.

08:54And that is going to have a little bit better feeling of Opus actually following your instructions. If you guys have seen my comparisons between Opus and g p t 5.5,

09:04one of the things that I've said is that I love how creative Opus is, but sometimes I want it to just do the thing and I want it to do exactly how I want it. And so maybe that's an issue of effort and also telling it too many negative prompts. You know what I mean?

09:15So it's a mix of the model, but also take accountability on yourself a little bit and think, how can I actually use the model the way that the engineers of the model have actually told me to? Anyways, a few other ones.

09:27It's going to default to reasoning before calling tools. So it's gonna try to figure out, you know, the questions to ask and the approach to take on its own with what it has right now before it looks to spawn a sub agent, for example, or to go read that database or to go do this. And sometimes that's really good.

09:42Right? Sometimes you want that reasoning before it starts doing things, but sometimes you want that extra context to be pulled in before the reasoning starts.

09:51So that's why it's really important to play with your prompting, obviously, to play with your your effort level, and to be especially when you're switching over all these workflows from 4.7 to 4.8, you don't just switch over and say go and blindly trust that everything's gonna, you know, stick the same. You kind of wanna watch it a little bit to to get a feel for how the model behaves.

10:08And then when you look at things like response length and verbosity, you can see here that I said it calibrates its own length. And basically what I meant by that is that it's going to judge how complex what it should do and how it should respond based on the complexity of the task rather than defaulting to some sort of fixed verbosity.

10:24So this usually means that shorter answers on simple lookups, and you'll get longer ones on more of an open ended analysis that takes more reasoning. So, anyways, those are some of my main takeaways. Obviously, like I said, I've only played around with this model for about half an hour.

10:36I wanted to get this video out quick, but as I find more stuff out, I will continue to update you guys. So last two pieces here. What are people saying right now?

10:44So, obviously, there's a lot of different feelings. Right? A lot of positive and excited stuff.

10:48Right? People are saying, oh, this already one shotted my GPT 5.5 right here. Strongest coding model yet.

10:54I'm hooked. This is super warm, super collaborative, big jumps and, you know, benchmarks. But once again,

10:59a lot of these people have the intention to do some stuff like that or say stuff like this because, I don't know, they want engagement or they're marketing something. And so, obviously, it's it's great to look at the full end of the spectrum, which is why I also pulled in some mixed and cautious reports, like some early reports of bugs already in OPUS 4.8.

11:17Maybe just because of the rollout, whatever it is, people are still testing it. So there's a lot of stuff to still be cautious about. But the overall vibe, which I think is pretty cool, is that it's almost like we have, like, these, you know, four or five main bullets of 4.7 problems, and most of these improvements that we're reading about from 4.8 are directly hitting those 4.7 problems and pitfalls.

11:35So at least we're getting that sense of Claude using that data to make it better. And if you really think about it, think about the way that you use Claude code.

11:44Right? You ask something. Claude code responds.

11:47You correct it. And you have this back and forth of, like, I don't like that. Do this better.

11:51Blah blah blah. I'm your master. And then what happens is because, obviously, Anthropic can read those logs and and, you know, train their models on that data, It's able to say, okay.

12:01What are people not happy with OPUS four one seven about? What are they constantly saying? And let's just bake that into the model.

12:07So it really it would concern me if a lot of these key problems weren't being addressed, like, head on. Anyways, but the key thing that I want you guys to always think about is that benchmarks look great, and they always will. And

12:20someone else's use case is not your use case. So figure out right now in your workflow, in your OPUS 4.7 workflows, what are your problems? What do you typically get frustrated by?

12:30And maybe OPUS 4.8 fixes those problems, but maybe it won't. Even though the model is better, that doesn't mean it's better for that specific problem. So just always be thinking about

12:39how can you work in different models or different context strategies or different effort levels to directly address the actual constraints and pain points that you're having right now. So look for things like the vibe upgrade. Look for things like how often you're self correcting this thing and giving it the same instruction over and over.

12:55Obviously, you should be working in, like, memory and different skill files and things like that to address that repetition, but still. And then, of course, the whole token and workflow efficiency feeling that. You typically get a sense of when you're getting near the end of your session limit and when you need to pull back a little bit.

13:09Apparently, based on the documentation, this model is more efficient with tokens, but we don't actually know yet. And one great way to test that kind of stuff out is you can use my token tracker, my token dashboard tracker, which is completely free.

13:22CTAIt's an open source, just GitHub repo. I will leave that in my free school community linked in the description. Just give Cloud Code to GitHub repo, tell it to set it up, and it will pull in all of your historical data with Cloud Code, and you can see where your tokens are actually going.

13:33CTABut, anyways, that is gonna do it for today. I hope you guys enjoyed this one or learned something new. And if you did, please give it a like.

13:38CTAIt helps me out a ton. And as always, I appreciate you guys made it to the of the video, and I'll see you on the next one. Thanks, guys.

— full transcript

§ 05 · For Joe

Effort level is the dial most Claude Code users never touch.

WHAT TO LEARN

Opus 4.8 ships with six effort tiers that behave differently enough to feel like different models — and most of the frustration attributed to 4.7 was effort misconfiguration as much as model limitation.

Running max effort on a simple task causes overengineering; running high effort on a complex autonomous task triggers early quitting — the mismatch, not the model, is usually the problem.
Positive framing outperforms negative constraints: telling the model what style to match lands better than listing what to avoid, because the model can reason about intent.
Giving the why behind an instruction is not optional polish — it is how the model calibrates compliance when an instruction conflicts with its defaults.
Opus 4.8 reasons before calling tools by default; if your workflow needs external context pulled in first, you have to prompt explicitly for that order of operations.
The honesty improvement is real and measurable: the model is four times less likely to report false completion, which changes how much you can trust unsupervised long-running tasks.
Benchmark scores measure the benchmark — they don't measure your workflow. Test the model on the specific task that frustrated you in 4.7 before declaring it fixed or broken.
Token efficiency claims from Anthropic are unverified at launch; use a session-level token tracker to confirm whether your actual costs dropped before adjusting budgets.
A model that feels stubborn or sassy is exhibiting a documented training characteristic, not random behavior — and 4.8 was explicitly retrained to reduce it.

§ 06 · Frame Gallery

Visual moments.

06:45