WEBVTT

00:00:00.000 --> 00:00:14.800
There are six patterns that separate the people who actually get how to use dynamic workflows to their fullest potential versus those who just turn it on once and never use it again. The engineers at Anthropic that both designed and built this feature just released a full masterclass

00:00:14.800 --> 00:01:29.690
on how to get the most out of it. So in this video, I'm gonna break it down for you and give you the TLDR so you don't have to read it yourself and you can just take all the nuggets. I went through the whole thing and pulled the six core design patterns that everything else is built on. And on top of that, it has some Easter eggs on when and when not to use workflows as well as how to share them with a team. If you're ready to take your workflows to the next level, then let's dive in. Now most people think that a dynamic workflow is just a fancier way to spin up agents, but that's not the real unlock. The real unlock is an understanding that this is a brand new way for Claude Co. To design and create a harness on the fly, which is like a little machine that can be custom built for whatever task that you hand it. And once you understand these six shapes that this machine can take, then you can apply them to pretty much any scenario. But before I can even go through all of the patterns at length, it's important to understand the core mechanics underneath them because they are the very reason why this feature even exists. So normally, when you give Claude code a task and you're only dealing with one context window, that context window is a glorified short term memory for that particular task. And for most things, this works just fine. But when you have longer running tasks or more complex tasks and you find yourself half a 600,000

00:01:29.690 --> 00:01:31.690
tokens into a conversation,

00:01:31.850 --> 00:01:46.275
this is when things start to unwind. And then you have things that are called agent laziness where you give it 15 tasks, it says it's going to do said tasks, but it only actually accomplishes seven. And the next one is self preference, and this is a huge one. So this is essentially

00:01:46.435 --> 00:02:03.070
having Claude code audit itself and saying that it did a great job. You're basically asking a single session that's currently running how its code is doing or the quality of its output. It's going to be biased like a person who creates their own deliverable to say that it's better than it actually is.

00:02:03.390 --> 00:02:10.485
And the last one is goal drift, where pretty much you have an overall goal at the beginning of a session, but after a long running conversation

00:02:10.565 --> 00:02:15.125
and tons of auto compactions and tool calls and summarizations,

00:02:15.365 --> 00:02:29.560
that detail that you had at the very beginning that made it fully synthesize and understand what it's trying to accomplish and how it should do so starts withering away. So a dynamic workflow fixes all of these problems by not focusing on one single cloud session,

00:02:29.720 --> 00:02:47.820
but instead spins up a series of agents, many of which will be SONNET 4.6 by default to go and have individual context windows to solve each problem separately. So now we can get into the nitty gritty, and these are the six core patterns. One is called classify and act. The second is fan out and synthesize.

00:02:47.820 --> 00:02:49.900
Then we have adversarial verification,

00:02:50.220 --> 00:02:52.300
generate and filter, tournament,

00:02:52.300 --> 00:03:11.375
which is my favorite, and then loop until done. For each one of these, I'm gonna walk you through what it is, how it works, and much more importantly, how you can actually use it. We'll start at the simplest one and work our way from there. So for the first one, classify and act. This is pretty much having the equivalent of a receptionist at the front door classifying

00:03:11.375 --> 00:03:25.170
what a task should be grouped into or which agents are responsible for said task. So it's meant to just have a very basic language model right here. We have the task. This has some form of system prompt. Once the system prompt filters,

00:03:25.330 --> 00:03:26.850
which is the responsible agent,

00:03:26.930 --> 00:03:33.595
this becomes the critical path or the chosen path. So a common example of when you'd use this is something like inbox triaging.

00:03:33.595 --> 00:03:48.850
We have an email come in and then the AI decides, is this a bug? Is this a request for a refund? Is this an upgrade request? Or something along those lines. Once it classifies it, it will route it to the right agent. Another way to think about this is that you're basically quarantining

00:03:48.930 --> 00:04:17.150
what should be done with that task or input before it goes to the next stage. So at the very beginning, you have a reader agent, then you have a ticket, and then it gets pushed on to the next part of the process where you have a trusted agent go and take action upon that input. The practical application prompt would be build a workflow that triages my inbox in name of folder by spawning a classifier agent that reads each ticket and routes it to a bug, refund, lead, spam handler,

00:04:17.230 --> 00:04:19.630
and deducing, basically removing duplicates

00:04:19.630 --> 00:04:29.175
against what is already tracked before any handler acts. And then we tell it how the quarantine process should be executed. The next process is called fanning out and synthesizing,

00:04:29.175 --> 00:04:32.375
and this is essentially the process of taking a task,

00:04:32.695 --> 00:04:36.135
breaking it into micro parts optimized for individual

00:04:36.295 --> 00:04:43.510
mutually exclusive agents to take care of them, and then you synthesize all of their individual results, their individual contributions,

00:04:43.590 --> 00:04:48.550
and then you bring them to one overall result. So a very practical and common application

00:04:48.710 --> 00:04:59.765
is deep research where you basically have one core thing you're trying to understand. So let's say you wanna research the best ways to use agent harnesses. It would go and see what are all the different lenses

00:04:59.845 --> 00:05:19.890
for this problem, and let's assign one agent by default, most likely, a Sonnet 4.6 agent for this angle of this research question. And once they all work in parallel, we'll retrieve all their answers at the very end just like sub agents and then have one overall result. Other than research, you could also apply this to a due diligence scenario where you have financials,

00:05:19.890 --> 00:05:32.055
contracts, legal, then you have some agent, and each one of those agents pulls out all the salient points for some form of red flag memo. But going to the terminal, this is a good prompt application of this concept.

00:05:32.375 --> 00:05:38.550
Build a workflow that does due diligence on the data room in folder by fanning out, keyword here,

00:05:38.870 --> 00:05:40.710
one sub agent per folder,

00:05:41.110 --> 00:05:45.030
each in its own clean context so the files never cross contaminate,

00:05:45.190 --> 00:05:52.935
and have every agent return a structured summary with the exact source path for each finding. So we're creating basically citations within documents.

00:05:53.255 --> 00:05:59.895
Then run a barrier synthesized step that waits for all of them to finish and merges their outputs into one-sided

00:05:59.895 --> 00:06:39.330
due diligence demo at name of your desired demo where every claim links back to the file it came from. So a little bit of extra emphasis here on the citations. By the way, if you're enjoying this content, you like the way they break down all these concepts, then you're going to love all of the exclusive content you'll find in my Claude code living course. I add at least one brand new module every single week that you'll never see on YouTube. So if you want access to that, our team of coaches included in your membership along with tons of other goodies and great builders you'll find in there, then check out the first thing down below, and I'll see you inside. Alright. Back to the video. Now this next example is one that I really want you to retain. It's called adversarial verification

00:06:39.490 --> 00:06:49.475
and it's meant to plug a hole in the issue of self preference. So instead of having Claude code think it's awesome all the time, you very intentionally employ

00:06:49.475 --> 00:07:12.180
three different skeptics or devil's advocates to look at the output and then cross reference it against some checklist or rubric. So one extra step that you can take here is before you even execute the workflow is you create the proper rubric. Because once you do the rubric, it's basically your pseudo plan that your other agents can push against and play devil's advocate using that as its core source.

00:07:13.095 --> 00:07:47.675
The natural application of this is fact checking. So if you produce anything with AI, like a blog or an article or anything where you're worried about all the facts being a 100% coherent and correct, running this kind of devil's advocate at scale can help you find any issues or any controversial statements prematurely. So if you wanna see a tactical example of this, imagine that you have a blog that you're drafting, and then from this blog, you have an extractor. And then from this extractor, you pull out individual claims. Once you have the claims, you need to decide if they're factual or non factual, then you spin up a series of sub agents.

00:07:47.915 --> 00:08:15.355
Each one checking one individual claim and going down a rabbit hole. And once they come back with their false positives, false negatives, then you can create a verified report on what needs to change in this block. If you want to execute this, you could use a prompt like this where you say, use a workflow to go through my blog post and verify each factual and technical claim before I ship it. Have one agent extract each claim into its own item, then for every claim,

00:08:15.515 --> 00:08:26.075
off a separate agent that checks it against the real source. So we're making sure that we don't have individual agents also having self preference bias. So the rest of the prompt basically walks through the behavior

00:08:26.680 --> 00:08:34.680
of making sure that when it's done, give me back the list of claims that have failed and the exact reason why each failed so that I know what to fix. And ideally,

00:08:35.000 --> 00:08:48.765
where the bias of the language model that might have drafted or written that blog post might have come from. And the next pattern is called generate and filter. And the whole point of this is to spin up a series of agents to over generate a series of ideas,

00:08:49.005 --> 00:09:25.060
project names. It could be whatever you're trying to ideate about. And once you have those ideas, it's easy to go from a thousand ideas to three versus to go from 10 to three. So it just basically gives you more variety. So I could use it, for example, to say, what should I title this video so it communicates the concept without being clickbaity? Find me a balanced 500 titles that have performed really well on YouTube in the past six months. So you practically use this wherever taste is required. So if you need your opinion on a cold email opener or the name of a brand new product or where you should execute a certain

00:09:25.300 --> 00:09:26.340
pop up offer.

00:09:26.660 --> 00:09:29.860
This would be a helpful way to go and do the research individually.

00:09:29.860 --> 00:09:40.115
Maybe do some market research analysis. Maybe you hook up some skills for these agents to be able to use to survey, scrape, do whatever they need to come back with the richest information possible,

00:09:40.275 --> 00:10:46.055
and then you synthesize and digest that down to just a few ideas. And if you wanna get fancier with it, then you can always integrate the judge in the workflow. So you have a series of agents, spin up all the ideas, then you have a series of judges that then critique all the ideas, then you have some rubric, again, ideal that you put together so you can quality control the quality controllers, and then you get the tested synthesized picks at the very end. To apply this, you could send over a prompt like this where it says, use a workflow to brainstorm a 40 video title and headline angle options for the topic in name of topic with one generator agent, then hand them all to a judge agent that scores every option against a series of criteria. Then at the very end here, we say the generator that brainstorms and the judge that scores must be different agents. So I might be spoon feeding or overturning it what to do, but it's never a bad idea to overexplain yourself to a language model. Now like I said before, pattern five is my favorite, and it's called the tournament pattern. And the way it works is this. Instead of dividing up the work amongst multiple agents like we saw in prior patterns,

00:10:46.295 --> 00:10:57.470
this one takes single ideas, sends them to fresh new agents, and asks it some controversial question. Should we go with this decision or not? And then this agent will go through every single reason,

00:10:57.710 --> 00:11:10.125
which one of the two options, it could be three options, four, is the best and why. And then all of the good ideas or the good proposals or good decisions move on to the next round until you get to the final bracket.

00:11:10.205 --> 00:11:27.450
So this will keep going pairwise until we get to the very end where we have a final. Then at each stage, we have an unbiased agent with a fresh context window. Instead of having Claude code look at 500 different decisions, its context window, its memory, its auto compaction,

00:11:27.690 --> 00:11:44.115
all that will lead to less accurate decisions. So when you break apart all the possible decision space into all these separate agents, you have a different way to also trace how a decision is made. And you could probably imagine the example use case of where this would be helpful. Imagine you have 5,000

00:11:44.115 --> 00:12:09.815
resumes, and instead of shoving those 5,000 resumes through a typical applicant tracking system or one Claude code session where it will inevitably find bias, context window issues, bloating. You basically break apart all those resumes into specific stacks. Do we pick this person or this person and explain why? Applying this, you could have a thousand different items, and then you have different rounds. Round one could be assessing

00:12:09.895 --> 00:12:12.135
all items based on x criteria.

00:12:12.375 --> 00:12:26.260
Then the next round would be on b criteria, and then we would keep going until we get to the final round. So the one key caveat here is that each round could technically have its own rubric. So it doesn't have to be seven,

00:12:26.260 --> 00:12:36.025
ten, 50 rounds with the exact same criteria. And a prompt for this could say, use a workflow to rank every resume for the back end engineer role by running a tournament

00:12:36.265 --> 00:12:37.785
of pairwise comparisons

00:12:37.865 --> 00:13:40.450
against a rubric instead of scoring each one cold where each head to head match is its own comparison agent and the deterministic loop holds the brackets so only the running order stays in context. So if you wanna steal a lot of my wording here, you're gonna be able to access all these prompts in the second link down below, so don't worry about it for now. But we're basically just setting the tone for this whole bracket to transpire. Now the last pattern is called loop and done, and this is very similar to something like slash goal where all you do is instead of telling the agent, go and do x 10 times, you just say, don't stop until you reach this specific outcome. So if you have brand new findings for a particular matter, it will keep going and spin up brand new agents to go through, double check, triple check until you reach the specific desired result. So maybe you have a bug that occurs in a platform you've put together. Maybe you vibe coded it, and maybe this bug happens once every 30 times, but you can't usually spot it. You have to keep refreshing. You have to keep trying different combinations to reproduce the bug. You could theoretically say,

00:13:40.770 --> 00:13:42.130
go through and

00:13:42.370 --> 00:14:07.455
run this until you receive the bug, and then once you get the bug, try and trace what's happening and how we can resolve it. If we need another application, imagine you send this loop and done on a wild goose chase to go through all of your conversations, your JSONL files that represent your cloud code sessions, and you say, keep looking through every single one of these conversations for every pattern I've made until you have a comprehensive

00:14:07.690 --> 00:14:22.605
non duplicative list of every single thing that I could improve to be that much better at Cloud Code. So you could say something like, build a workflow that hunts down a flaky test that fails maybe one in 50 runs, keep forming theories about the cause and adversarially

00:14:22.605 --> 00:14:28.365
testing each one in its own isolated work tree. And when it comes to the prompt, it could look something like this.

00:14:28.685 --> 00:14:35.565
Build a workflow that hunts down what's called a flaky test in the test folder that fails maybe one in 50 runs.

00:14:36.100 --> 00:14:39.220
Keep forming theories about the cause and adversarially

00:14:39.220 --> 00:14:45.620
testing each one in its own isolated work tree. This is basically a separate session for an agent.

00:14:45.860 --> 00:15:18.635
Looping and spawning new attempts with no fixed pass counts. So basically telling it, we are not telling you to go check this 10 times or 20 times. Keep checking it until this specific result is achieved. Now those were the six core patterns, but this is how you go from just knowing about them to actually becoming dangerous with them, which is stacking them on top of each other. So imagine you have a realistic scenario where you vibe quoted some CRM, and this CRM has an onboarding flow for clients. And you wanna find ways to improve that onboarding flow, make it very thoughtful, and think of different order consequences.

00:15:18.715 --> 00:15:25.915
Maybe you go and say fan out a series of agents to look at the code base and pull out all the insights on what should change and why.

00:15:26.235 --> 00:15:31.730
And then once that is pulled out, then you move on to pattern number three, which was the adversarial

00:15:31.730 --> 00:15:37.970
verify step. So now it has a series of devil's advocates that goes against all the findings that were found.

00:15:38.450 --> 00:16:12.280
Then from that step, maybe you add on a loop until done. So maybe you combine a slash goal in the scenario, and then you apply that to this specific circumstance. So maybe you have a process that keeps going until it can't find any more optimizations that it could possibly make given the specific avatar who's using the platform. The best part about this is that you don't have to design it by hand. It's really a matter of just using the keywords in the right way to get the confirmed result or the ideal result. If you wanted to put all of this into one single prompt, you could say something like, build a workflow that audits every file under code base,

00:16:12.440 --> 00:16:23.285
fans out one agent, keyword right here, fans out per file, has a separate agent try to refute each of the findings against the codes, and this is our adversarial

00:16:23.285 --> 00:16:24.085
verify,

00:16:24.325 --> 00:16:35.060
and loops until a clean pass turns up with no new issues. Return only the confirmed issues each with the file and the exact line, then for good measure, you could add slash goal.

00:16:35.220 --> 00:16:51.745
Do not stop until a full clean pass finds no new issues. Now do you need this last sentence? Maybe not. It would get the idea, but if you wanna add it for an additional layer, then you can absolutely do so. And like I said before, you can also share workflows very similar to how you share skills. And pretty much every

00:16:52.145 --> 00:17:13.845
workflow is purely a JavaScript file. And when you combine this JavaScript file with your skill file and maybe any additional markdown file that it depends on, all you'll get is one overall folder with the skill m d, the JavaScript file, and anything else that's involved. So it will look something like this. If we go into this folder, you'll have this rubric,

00:17:14.005 --> 00:17:27.860
you'll have the SkillMD, and then you'll have this verify claims workflow dot JavaScript file. If you're not familiar, if you ever wanna save a workflow, all you'd have to do is go into let's say we run this. So we do slash

00:17:28.180 --> 00:17:29.060
workflows.

00:17:29.700 --> 00:17:49.265
This would show you any workflows running at the moment. And then while this is running, you could always save that specific workflow. Once it saves, it'll allow you to store this as that JavaScript file that I referred to before. The one additional thing that they write in the guide is that you can basically always tell Claude code what its budget is when it comes to token usage because

00:17:49.770 --> 00:17:52.410
this is a very token consuming

00:17:52.410 --> 00:18:05.745
feature which should be used sparingly for very large use cases or use cases that have different layers of complexity. So one of the examples of when to not use a workflow are for basic tasks. So if you have a platform

00:18:05.905 --> 00:18:29.160
and you have a series of buttons that you wanna change the color of or you want them to pulsate when you click on them, you wouldn't spin up an agent team just to do that. You could do that individually just using basic prompts. So as we get infinitely better models and we go from 4.8 to 4.9 to five, you'll be able to do a lot more with a lot less. So you won't need agents for all tasks, but when it's time to use the power of agents,

00:18:29.320 --> 00:18:35.835
they will be there and you'll be able to use these swarms for much more complex matters. So hopefully, breaks down and demystifies

00:18:35.835 --> 00:19:23.510
all the power and all the potential that you might be leaving on the table by not using these patterns in your day to day. Like I said, you'll find all the prompts I showed you along with some additional goodies like an example of that skill folder with the workflow inside of it in the second link down below so you can use it, get started, and start leveling up your workflows like I promised. If you wanna take your AgenTic workflows and AgenTic OS systems to the next level and you wanna master Cloud Code, then make sure you check out the first link down below for my early adopters community. This week's upcoming live module will be a deeper dive on dynamic workflows and how I've used it myself for things like travel and personal stuff outside of core business use cases. And for the rest of you, if you found this helpful and novel, I'd super appreciate a like on the video and a comment if you so choose, and I'll see you in the next one.