WEBVTT

00:00:00.000 --> 00:00:17.815
So Cloud Opus four dot eight is here. And while most people are focusing on these trust me, bro, benchmarks, I think the real unlock are the major improvements that they made to their coding hardness Cloud Code. So in this video, I'll give you a rundown of this ultra code mode and how to use it and also their new dynamic workflows feature that is Anthropic's answer to long running agents. Let's dive into it.

00:00:21.015 --> 00:00:51.015
So OPUS four dot eight is now out. And as usual, they published the benchmarks in here, which you can just browse on your own time. But at least after having worked with these models quite a lot already, I think these benchmarks are good to show incremental improvements on where these new models and releases are directionally strong at. But in my experience, a lot of the power of these models now don't actually directly come from the models, but mostly come from the updates to the harness itself. So if you scroll down to their update in here, they sort of buried the lead in here to put this dynamic workflows feature

00:00:51.095 --> 00:01:18.755
in this also launching today section. And if you go through their document, dynamic workflows in Cloud Code, basically, what it does is it helps Cloud take on the most challenging task end to end. So what that means is, let's say, have a problem that is too big for one pass by a single agent and you actually want multiple concurrent agents or sub agents to accomplish that task, you can use dynamic workflows in order to have one orchestrator agent, which is gonna be OPUS four dot eight, and have it orchestrate this multiphase

00:01:18.755 --> 00:01:36.530
plan to accomplish that one big task that you assign to it. And apart from dynamic workflows, another thing that they released that they, again, sort of buried the lead in is this new Cloud Code specific setting called Ultra Code. And what it does is set the effort level to extra high while also letting Cloud decide automatically

00:01:36.530 --> 00:01:49.355
when to use a dynamic workflow to handle your task. So what does that look like in practice, and how can we invoke these? Well, let's just go into a demo and try it out ourselves. So to use dynamic workflows and UltraCode, you can actually use the Versus Code extension.

00:01:49.595 --> 00:02:16.835
So if I go to the effort toggle in here once you update your Cloud Code, you can now see that there is this ultra code option which turns it to purple. And, of course, you can also access it via the terminal view. And I'll just demo it here because at least in the terminal view, it seems like the Anthropic team deemed this release so important that they even assigned, like, a custom sort of a rainbow color whenever you type in workflows into your prompt in here. And then similarly, if you change the effort to ultra code, that also has that nice animation

00:02:17.075 --> 00:02:19.955
that they even coded just for that release.

00:02:20.530 --> 00:02:34.770
For our test prompt, what I'm going to do is give it a sufficiently complex task where I'm asking it to use dynamic workflows to audit three ecommerce websites for a direct to consumer growth agency in Sydney. So it's going to be a brand audit of these three websites.

00:02:35.115 --> 00:02:40.795
And for each of these domains, we need a technical SEO scorecard, a content and keyword gap analysis,

00:02:41.035 --> 00:02:45.355
conversion and user experience flags, three quick wins, three mid effort wins,

00:02:45.595 --> 00:03:40.165
and basically a brand audit that would have taken a mid class agency a couple of days to put together in the past. So we'll fire that off using dynamic workflows, and we'll actually see what Cloud Code is going to do with this complex task. And just to show you how much tokens this will consume, I'm currently on the max plan for this account. And right now, my weekly rate limits is consumed at the 2% mark. So we'll see by the end of this test how much percentage tokens that will consume. And by the way, if you're interested in going from just using AI to getting paid for it, then check out the Robo Nuggets community down in the description. We've got founders in there who landed their first client in weeks, live build sessions where we create this stuff together, and the actual templates behind what I just showed in this video. The community is also the reason these lessons get made, so see that below if that's for you. Okay. So now that it is running, you can see that what it did here is it's starting to fan out nine audit agents in order to do this task for us. Now what's good about it is that it actually recognize that dynamic workflows is going to be token intensive.

00:03:40.325 --> 00:03:45.925
So if you are saving up on tokens, this is probably not something that you would want to just do randomly.

00:03:46.005 --> 00:03:53.610
But at least for the sake of this demo, let's just go ahead and run it. Now it's saying that the workflow is running in the background with 13 live fetch agents.

00:03:53.690 --> 00:04:05.585
And interestingly, you can see what it's doing here that it's acting as sort of the manager or the orchestrator of this whole task. So it's saying that it's using the weight productively and pre building the report generator so that the moment the data lands,

00:04:05.745 --> 00:04:21.860
it can turn it into the deliverables fast. So that is what I'm talking about with regard to the harness. Yes. OPUS 4 Dot 7 to OPUS 4 Dot 8 is a nice job, and they'll always show good benchmarks whenever these new models release. But this sort of user experience and the way these agents are architected

00:04:21.860 --> 00:04:37.515
really matter a lot more versus the benchmarks that you usually see just the front loaded in a lot of these YouTube videos. Alright. So this has been running for around five minutes now. One thing you can do actually, because it's saying here that 12 out of 13 agents are done, is you can type in slash workflows.

00:04:38.475 --> 00:05:29.085
And what that now shows is a proper plan that your Orchestrator agent has drafted up around this audit. So you can see for phase one, which is the audit itself, those nine agents are already done, and it also shows the amount of tokens that they consumed if you're particular about that. There is a phase two around planning, which I assume is basically planning out its output. And then now we have this synthesis agent in order to put all of those learnings together. So if it's been running for a while and you need a view of how it's going, then this is one way for you to monitor or observe the progress of your long running task. Alright. So now it's done, and it gave us a couple of deliverables. It gave us three brand reports for each of those websites. It gave us a comparison sheet as well as the summary document. So if you look at the executive summary, you can see it has the ranking of those different ecommerce website.

00:05:29.245 --> 00:06:07.485
It has the top three takeaways. So it highlighted here, let's say, the biggest SEO upside, so definitely agree that that should be top of line for the headlines. And if you look at the individual PDF reports, you can see a more summarized version per brand around their technical SEO capabilities, content and keywords, and conversion and UX with a lot more detail down the line. Now something that I think it should have done or maybe I should have included in the prompt is a proper design look of this report. Because right now, it's very vanilla white paper. No one's really gonna read this type of design. Right? So what you can do here is to just ask Cloud Code to iterate on this. And at least from my side, I have this robo group design system to update the executive summary and those PDFs

00:06:07.850 --> 00:06:45.760
so that it is a bit more beautifully designed versus this white paper report that is technically rich but is probably not as enticing to read. Once that's done, it's now been properly formatted and is just much nicer to read. And from here, you can just tweak all the details that it got, turn them into slides, ask Cloud Code to revise the wording of it as you would usually do. But at least a lot of the hard work and a lot of the research behind this report has already been done by your multiple agents, and it only took, like, five minutes. K. Now let's do another test. And for this one, let's try out the UltraCode effort. So if you just type in effort, you can change the effort level here to the UltraCode

00:06:45.760 --> 00:06:50.160
smarter level. And just to simplify this, if in case you haven't used or tweaked effort before,

00:06:50.645 --> 00:07:09.420
effort basically pertains to how many times a model thinks about its output before serving it to you. And for UltraCode, can see the sub headline here. That's essentially extra high, plus it gives the model the ability to decide if it wants to use dynamic workflows or not. So let's just select that. So now we're in UltraCode.

00:07:09.420 --> 00:07:31.145
And for this task, I'm giving it this prompt where I'm asking it to audit the Rubrik app. And in case you're new, the Rubrik app is my own personal command center, which I also share with my community. And, basically, whenever I create, like, micro apps that help me with our work, I just put them here to centralize all of those applications in one view. And so what we'll ask OPUS four dot eight to do is audit that whole vibe coded application

00:07:31.145 --> 00:07:38.060
and search thoroughly for bugs. So I'm going to ask for a ranked bug list, and I want a stand alone report

00:07:38.140 --> 00:08:12.540
that will just provide me a nice view of what the bugs are so that we can fix it later. So I'll fire that off, and I just made it a bit more open ended versus the other tests that we did. Because I think with UltraCode, what should now happen is that if it deems this task to be large enough, which I think it is, it should go ahead and use dynamic workflows on its own accord and actually fan out those sub agents to hit on this task. Alright. So it's starting that task now. I just like to point out. So you can see here that when you're on the ultra code effort method, what it's doing here is that right now, it is doing the work as a sole agent, but it has that intelligence

00:08:12.620 --> 00:08:16.860
to do some initial analysis first and an initial discovery before

00:08:17.125 --> 00:08:27.525
doing or orchestrating the deep audit. So you can see here it recognized that UltraCode is on. So it'll orchestrate a fan out audit with adversarial per finding verification.

00:08:27.525 --> 00:09:24.475
So lots of big words for Opus four dot eight. But basically, what that means is that initially, I'll do a pre assessment first before doing a proper fan out audit. And if it seems like it's a big task, then I'll decide on my own if I need to spawn some sub agents, some interns to help me out with this task. And now here you go. It's now doing the audit across eight parallel auditors. So we can actually type in slash workflows now, and what it's now showing us is the status of that task. So we have the rubric bug audit. We have these eight agents that are all doing the work, all OPUS four dot eight, and you can see here the status of each in terms of that usage. So I just came back to this, and I just wanted to show before I continue this. When it did its verification step, what it did is spawn 88 parallel sub agents in order to verify its findings in here. So I guess that claim out there is that you can spawn hundreds of parallel sub agents in order to do this long running task is apparently true. So it can happen. And with just this bug report audit, it was able to spawn

00:09:24.850 --> 00:09:25.650
96

00:09:25.650 --> 00:10:11.685
total sub agents in here. Alright. So the UltraCode run is now done, and it gave us this HTML page that gave us a view of some of the bugs across my personal dashboard, which is not surprising because at least for my version of this dashboard, it is just for personal use. You can see how powerful this is. Right? If you have an application or a dashboard that you are serving to clients, you can just use UltraCode and dynamic workflows in order to spawn several sub agents, 96 sub agents, at least for this case, to find critical high and medium bugs, which now from here, you can either read through it or just have Claude code address the top ones. Alright. And one last thing, if we check back on our account and usage, you can see our weekly rate limits jump from two to 6%. So those two tasks, they're very heavy tasks. It costed us 4%

00:10:11.685 --> 00:11:11.435
of our weekly rate limit. So that tells you, number one, how token intensive these modes are, so be warned. And number two, it sort of kind of tells you how token constrained Tropic still is. In my view, this standard of presenting our rate limits as a percentage should actually be changed. Like, I would much rather them have, an absolute number of tokens in here so that we can really measure when they're saying that they're increasing rate limits or not so that it's much more transparent. Similar to how you're accessing the Internet and you exactly know how much data you're using as part of your mobile data plans, for example. But anyway, that's a topic for another video. But there you go. That is UltraCode and dynamic workflows. OPUS four dot eight is great. It's a great incremental release. But I think for most use cases, the way we work is probably going to be dictated more by the updates that they do to their harness. And UltraCode and dynamic workflows are just two updates that I think are worth paying attention to. If that's useful, then consider subscribing because that helps us a lot to put out more educational content like this. As always, thanks for sticking until the end, and I'll see you guys next time. Thank you.
