Mark Kashef · Youtube · 19:24

Every Claude Code Dynamic Workflow (& When to Use Each)

Six composable agent patterns from Anthropic's own internal masterclass, with live prompts and honest advice on when to skip workflows entirely.

Posted
June 3rd 2026
yesterday
Duration
19:24
Format
Tutorial
educational
Channel
MK
Mark Kashef
§ 01 · The Hook

The bait, then the rug-pull.

Anthropic's own engineers built a masterclass on dynamic workflows and released it quietly. This breakdown does the reading for you: six core patterns, the three failure modes they fix, and one honest warning that most workflow tutorials skip entirely.

§ · Chapters

Where the time goes.

00:00 – 00:33

01 · Intro

Six patterns from Anthropic's internal masterclass; TLDR promise.

00:33 – 01:11

02 · The Real Unlock

Claude Code builds a custom harness on the fly, not just spawns agents.

01:11 – 02:23

03 · Three Failure Modes of One Context Window

Agentic laziness, self-preference bias, and goal drift all stem from a single overloaded session.

02:23 – 03:01

04 · How Dynamic Workflows Fix It

Separate Sonnet 4.6 agents each get a clean context window; problems are isolated.

03:01 – 04:22

05 · Pattern 1: Classify and Act

A receptionist classifier routes incoming tasks to the correct specialist; inbox triage prompt shown live.

04:22 – 06:33

06 · Pattern 2: Fan Out and Synthesize

Parallel agents on independent sub-tasks, merged with citations; deep research and due diligence prompts.

06:33 – 08:40

07 · Pattern 3: Adversarial Verification

Three skeptic agents cross-reference output against a rubric; fact-check blog post prompt live.

08:40 – 10:30

08 · Pattern 4: Generate and Filter

Over-generate ideas with parallel agents, apply judge plus rubric to winnow; video title prompt.

10:30 – 13:00

09 · Pattern 5: Tournament

Pairwise head-to-head in bracket rounds, each match a fresh agent; resume ranking example.

13:00 – 14:12

10 · Pattern 6: Loop Until Done

Spawn new agents until outcome condition is met, no fixed pass count; flaky test hunting.

14:12 – 16:46

11 · Stacking Patterns

Fan out plus adversarial verify plus loop until done chained in one prompt; CRM onboarding audit.

16:46 – 17:36

12 · Sharing as a Skill

Every workflow is a JS file; package with SKILL.md and rubric into a shareable folder.

17:36 – 18:00

13 · Token Budget and Saving

Use /workflows to inspect running ones; set explicit token caps; /goal adds a hard stop condition.

18:00 – 18:41

14 · When NOT to Use Workflows

Workflows burn tokens; only reach for them when a single agent would genuinely fail.

18:41 – 19:24

15 · Recap

Free prompt pack linked; live community module on deeper workflow use cases.

§ · Storyboard

Visual structure at a glance.

open
real unlock
failure modes
solution
pattern 1
pattern 3
pattern 5
stacking
when not to use
recap and CTA
§ · Frameworks

Named ideas worth stealing.

02:29 list

Six Claude Code Dynamic Workflow Patterns

  1. Classify and Act
  2. Fan Out and Synthesize
  3. Adversarial Verification
  4. Generate and Filter
  5. Tournament
  6. Loop Until Done

Six composable agent orchestration patterns that solve the three failure modes of single-context Claude sessions.

Steal for Any complex multi-step agentic task where a single session would degrade
01:11 list

Three Failure Modes of One Context Window

  1. Agentic laziness
  2. Self preference
  3. Goal drift

The three ways a single long Claude Code session breaks down on complex tasks.

Steal for Diagnosing why your solo agent prompt is not delivering reliable output
§ · Quotables

Lines you could clip.

01:40
"You're basically asking a single session that's currently running how its code is doing. It's going to be biased like a person who creates their own deliverable."
Self-contained analogy that lands without any setup → TikTok hook
09:05
"It's easy to go from a thousand ideas to three versus to go from ten to three."
Tight memorable principle, quotable without context → IG reel cold open
18:10
"Otherwise you are just lighting money on fire to feel fancy."
Punchy one-liner warning against over-engineering → newsletter pull-quote
§ · Resources Mentioned

Things they pointed at.

00:00linkAnthropic Dynamic Workflows Masterclass
§ · CTA Breakdown

How they asked for the click.

06:33 product
"check out the first thing down below for my Claude code living course"

Mid-video at ~6:30; briefly stepped out of the pattern breakdown to plug the paid community, then returned cleanly. Low-friction ask with clear value prop.

§ 04 · The Script

Word for word.

HOOK opening / re-engagementCTA the pitch metaphor analogy
00:00HOOKThere are six patterns that separate the people who actually get how to use dynamic workflows to their fullest potential versus those who just turn it on once and never use it again. The engineers at Anthropic that both designed and built this feature just released a full masterclass on how to get the most out of it.
00:16HOOKSo in this video, I'm gonna break it down for you and give you the TLDR so you don't have to read it yourself and you can just take all the nuggets. I went through the whole thing and pulled the six core design patterns that everything else is built on. And on top of that, it has some Easter eggs on when and when not to use workflows as well as how to share them with a team.
00:33If you're ready to take your workflows to the next level, then let's dive in. Now most people think that a dynamic workflow is just a fancier way to spin up agents, but that's not the real unlock. The real unlock is an understanding that this is a brand new way for Claude Co.
00:47To design and create a harness on the fly, which is like a little machine that can be custom built for whatever task that you hand it. And once you understand these six shapes that this machine can take, then you can apply them to pretty much any scenario. But before I can even go through all of the patterns at length, it's important to understand the core mechanics underneath them because they are the very reason why this feature even exists.
01:11So normally, when you give Claude code a task and you're only dealing with one context window, that context window is a glorified short term memory for that particular task. And for most things, this works just fine. But when you have longer running tasks or more complex tasks and you find yourself half a 600,000
01:29tokens into a conversation, this is when things start to unwind. And then you have things that are called agent laziness where you give it 15 tasks, it says it's going to do said tasks, but it only actually accomplishes seven.
01:42And the next one is self preference, and this is a huge one. So this is essentially having Claude code audit itself and saying that it did a great job.
01:50You're basically asking a single session that's currently running how its code is doing or the quality of its output. It's going to be biased like a person who creates their own deliverable to say that it's better than it actually is. And the last one is goal drift, where pretty much you have an overall goal at the beginning of a session, but after a long running conversation
02:10and tons of auto compactions and tool calls and summarizations, that detail that you had at the very beginning that made it fully synthesize and understand what it's trying to accomplish and how it should do so starts withering away. So a dynamic workflow fixes all of these problems by not focusing on one single cloud session,
02:29but instead spins up a series of agents, many of which will be SONNET 4.6 by default to go and have individual context windows to solve each problem separately. So now we can get into the nitty gritty, and these are the six core patterns. One is called classify and act.
02:45The second is fan out and synthesize. Then we have adversarial verification, generate and filter, tournament,
02:52which is my favorite, and then loop until done. For each one of these, I'm gonna walk you through what it is, how it works, and much more importantly, how you can actually use it. We'll start at the simplest one and work our way from there.
03:03So for the first one, classify and act. This is pretty much having the equivalent of a receptionist at the front door classifying what a task should be grouped into or which agents are responsible for said task.
03:16So it's meant to just have a very basic language model right here. We have the task. This has some form of system prompt.
03:23Once the system prompt filters, which is the responsible agent, this becomes the critical path or the chosen path.
03:29So a common example of when you'd use this is something like inbox triaging. We have an email come in and then the AI decides, is this a bug? Is this a request for a refund?
03:39Is this an upgrade request? Or something along those lines. Once it classifies it, it will route it to the right agent.
03:45Another way to think about this is that you're basically quarantining what should be done with that task or input before it goes to the next stage. So at the very beginning, you have a reader agent, then you have a ticket, and then it gets pushed on to the next part of the process where you have a trusted agent go and take action upon that input.
04:03The practical application prompt would be build a workflow that triages my inbox in name of folder by spawning a classifier agent that reads each ticket and routes it to a bug, refund, lead, spam handler, and deducing, basically removing duplicates against what is already tracked before any handler acts.
04:22And then we tell it how the quarantine process should be executed. The next process is called fanning out and synthesizing, and this is essentially the process of taking a task,
04:32breaking it into micro parts optimized for individual mutually exclusive agents to take care of them, and then you synthesize all of their individual results, their individual contributions, and then you bring them to one overall result.
04:45So a very practical and common application is deep research where you basically have one core thing you're trying to understand. So let's say you wanna research the best ways to use agent harnesses.
04:56It would go and see what are all the different lenses for this problem, and let's assign one agent by default, most likely, a Sonnet 4.6 agent for this angle of this research question. And once they all work in parallel, we'll retrieve all their answers at the very end just like sub agents and then have one overall result.
05:15Other than research, you could also apply this to a due diligence scenario where you have financials, contracts, legal, then you have some agent, and each one of those agents pulls out all the salient points for some form of red flag memo. But going to the terminal, this is a good prompt application of this concept.
05:32Build a workflow that does due diligence on the data room in folder by fanning out, keyword here, one sub agent per folder, each in its own clean context so the files never cross contaminate,
05:45and have every agent return a structured summary with the exact source path for each finding. So we're creating basically citations within documents. Then run a barrier synthesized step that waits for all of them to finish and merges their outputs into one-sided
05:59due diligence demo at name of your desired demo where every claim links back to the file it came from. So a little bit of extra emphasis here on the citations. By the way, if you're enjoying this content, you like the way they break down all these concepts, then you're going to love all of the exclusive content you'll find in my Claude code living course.
06:18CTAI add at least one brand new module every single week that you'll never see on YouTube. So if you want access to that, our team of coaches included in your membership along with tons of other goodies and great builders you'll find in there, then check out the first thing down below, and I'll see you inside. Alright.
06:33CTABack to the video. Now this next example is one that I really want you to retain. It's called adversarial verification
06:39CTAand it's meant to plug a hole in the issue of self preference. So instead of having Claude code think it's awesome all the time, you very intentionally employ three different skeptics or devil's advocates to look at the output and then cross reference it against some checklist or rubric.
06:57CTASo one extra step that you can take here is before you even execute the workflow is you create the proper rubric. Because once you do the rubric, it's basically your pseudo plan that your other agents can push against and play devil's advocate using that as its core source. The natural application of this is fact checking.
07:15So if you produce anything with AI, like a blog or an article or anything where you're worried about all the facts being a 100% coherent and correct, running this kind of devil's advocate at scale can help you find any issues or any controversial statements prematurely. So if you wanna see a tactical example of this, imagine that you have a blog that you're drafting, and then from this blog, you have an extractor.
07:38And then from this extractor, you pull out individual claims. Once you have the claims, you need to decide if they're factual or non factual, then you spin up a series of sub agents. Each one checking one individual claim and going down a rabbit hole.
07:52And once they come back with their false positives, false negatives, then you can create a verified report on what needs to change in this block. If you want to execute this, you could use a prompt like this where you say, use a workflow to go through my blog post and verify each factual and technical claim before I ship it.
08:10Have one agent extract each claim into its own item, then for every claim, off a separate agent that checks it against the real source. So we're making sure that we don't have individual agents also having self preference bias.
08:23So the rest of the prompt basically walks through the behavior of making sure that when it's done, give me back the list of claims that have failed and the exact reason why each failed so that I know what to fix. And ideally,
08:35where the bias of the language model that might have drafted or written that blog post might have come from. And the next pattern is called generate and filter. And the whole point of this is to spin up a series of agents to over generate a series of ideas,
08:49project names. It could be whatever you're trying to ideate about. And once you have those ideas, it's easy to go from a thousand ideas to three versus to go from 10 to three.
08:59So it just basically gives you more variety. So I could use it, for example, to say, what should I title this video so it communicates the concept without being clickbaity? Find me a balanced 500 titles that have performed really well on YouTube in the past six months.
09:14So you practically use this wherever taste is required. So if you need your opinion on a cold email opener or the name of a brand new product or where you should execute a certain pop up offer.
09:26This would be a helpful way to go and do the research individually. Maybe do some market research analysis. Maybe you hook up some skills for these agents to be able to use to survey, scrape, do whatever they need to come back with the richest information possible,
09:40and then you synthesize and digest that down to just a few ideas. And if you wanna get fancier with it, then you can always integrate the judge in the workflow. So you have a series of agents, spin up all the ideas, then you have a series of judges that then critique all the ideas, then you have some rubric, again, ideal that you put together so you can quality control the quality controllers, and then you get the tested synthesized picks at the very end.
10:04To apply this, you could send over a prompt like this where it says, use a workflow to brainstorm a 40 video title and headline angle options for the topic in name of topic with one generator agent, then hand them all to a judge agent that scores every option against a series of criteria. Then at the very end here, we say the generator that brainstorms and the judge that scores must be different agents.
10:27So I might be spoon feeding or overturning it what to do, but it's never a bad idea to overexplain yourself to a language model. Now like I said before, pattern five is my favorite, and it's called the tournament pattern. And the way it works is this.
10:41Instead of dividing up the work amongst multiple agents like we saw in prior patterns, this one takes single ideas, sends them to fresh new agents, and asks it some controversial question. Should we go with this decision or not?
10:54And then this agent will go through every single reason, which one of the two options, it could be three options, four, is the best and why. And then all of the good ideas or the good proposals or good decisions move on to the next round until you get to the final bracket.
11:10So this will keep going pairwise until we get to the very end where we have a final. Then at each stage, we have an unbiased agent with a fresh context window. Instead of having Claude code look at 500 different decisions, its context window, its memory, its auto compaction,
11:27all that will lead to less accurate decisions. So when you break apart all the possible decision space into all these separate agents, you have a different way to also trace how a decision is made. And you could probably imagine the example use case of where this would be helpful.
11:42Imagine you have 5,000 resumes, and instead of shoving those 5,000 resumes through a typical applicant tracking system or one Claude code session where it will inevitably find bias, context window issues, bloating. You basically break apart all those resumes into specific stacks.
12:00Do we pick this person or this person and explain why? Applying this, you could have a thousand different items, and then you have different rounds. Round one could be assessing
12:09all items based on x criteria. Then the next round would be on b criteria, and then we would keep going until we get to the final round. So the one key caveat here is that each round could technically have its own rubric.
12:24So it doesn't have to be seven, ten, 50 rounds with the exact same criteria. And a prompt for this could say, use a workflow to rank every resume for the back end engineer role by running a tournament
12:36of pairwise comparisons against a rubric instead of scoring each one cold where each head to head match is its own comparison agent and the deterministic loop holds the brackets so only the running order stays in context. So if you wanna steal a lot of my wording here, you're gonna be able to access all these prompts in the second link down below, so don't worry about it for now.
12:55But we're basically just setting the tone for this whole bracket to transpire. Now the last pattern is called loop and done, and this is very similar to something like slash goal where all you do is instead of telling the agent, go and do x 10 times, you just say, don't stop until you reach this specific outcome. So if you have brand new findings for a particular matter, it will keep going and spin up brand new agents to go through, double check, triple check until you reach the specific desired result.
13:25So maybe you have a bug that occurs in a platform you've put together. Maybe you vibe coded it, and maybe this bug happens once every 30 times, but you can't usually spot it. You have to keep refreshing.
13:35You have to keep trying different combinations to reproduce the bug. You could theoretically say, go through and
13:42run this until you receive the bug, and then once you get the bug, try and trace what's happening and how we can resolve it. If we need another application, imagine you send this loop and done on a wild goose chase to go through all of your conversations, your JSONL files that represent your cloud code sessions, and you say, keep looking through every single one of these conversations for every pattern I've made until you have a comprehensive
14:07non duplicative list of every single thing that I could improve to be that much better at Cloud Code. So you could say something like, build a workflow that hunts down a flaky test that fails maybe one in 50 runs, keep forming theories about the cause and adversarially testing each one in its own isolated work tree.
14:25And when it comes to the prompt, it could look something like this. Build a workflow that hunts down what's called a flaky test in the test folder that fails maybe one in 50 runs. Keep forming theories about the cause and adversarially
14:39testing each one in its own isolated work tree. This is basically a separate session for an agent. Looping and spawning new attempts with no fixed pass counts.
14:49So basically telling it, we are not telling you to go check this 10 times or 20 times. Keep checking it until this specific result is achieved. Now those were the six core patterns, but this is how you go from just knowing about them to actually becoming dangerous with them, which is stacking them on top of each other.
15:05So imagine you have a realistic scenario where you vibe quoted some CRM, and this CRM has an onboarding flow for clients. And you wanna find ways to improve that onboarding flow, make it very thoughtful, and think of different order consequences. Maybe you go and say fan out a series of agents to look at the code base and pull out all the insights on what should change and why.
15:26And then once that is pulled out, then you move on to pattern number three, which was the adversarial verify step. So now it has a series of devil's advocates that goes against all the findings that were found.
15:38Then from that step, maybe you add on a loop until done. So maybe you combine a slash goal in the scenario, and then you apply that to this specific circumstance. So maybe you have a process that keeps going until it can't find any more optimizations that it could possibly make given the specific avatar who's using the platform.
15:55The best part about this is that you don't have to design it by hand. It's really a matter of just using the keywords in the right way to get the confirmed result or the ideal result. If you wanted to put all of this into one single prompt, you could say something like, build a workflow that audits every file under code base,
16:12fans out one agent, keyword right here, fans out per file, has a separate agent try to refute each of the findings against the codes, and this is our adversarial verify, and loops until a clean pass turns up with no new issues.
16:28Return only the confirmed issues each with the file and the exact line, then for good measure, you could add slash goal. Do not stop until a full clean pass finds no new issues. Now do you need this last sentence?
16:40Maybe not. It would get the idea, but if you wanna add it for an additional layer, then you can absolutely do so. And like I said before, you can also share workflows very similar to how you share skills.
16:50And pretty much every workflow is purely a JavaScript file. And when you combine this JavaScript file with your skill file and maybe any additional markdown file that it depends on, all you'll get is one overall folder with the skill m d, the JavaScript file, and anything else that's involved.
17:08So it will look something like this. If we go into this folder, you'll have this rubric, you'll have the SkillMD, and then you'll have this verify claims workflow dot JavaScript file.
17:19If you're not familiar, if you ever wanna save a workflow, all you'd have to do is go into let's say we run this. So we do slash workflows.
17:29This would show you any workflows running at the moment. And then while this is running, you could always save that specific workflow. Once it saves, it'll allow you to store this as that JavaScript file that I referred to before.
17:41The one additional thing that they write in the guide is that you can basically always tell Claude code what its budget is when it comes to token usage because this is a very token consuming feature which should be used sparingly for very large use cases or use cases that have different layers of complexity.
17:59So one of the examples of when to not use a workflow are for basic tasks. So if you have a platform and you have a series of buttons that you wanna change the color of or you want them to pulsate when you click on them, you wouldn't spin up an agent team just to do that.
18:14You could do that individually just using basic prompts. So as we get infinitely better models and we go from 4.8 to 4.9 to five, you'll be able to do a lot more with a lot less. So you won't need agents for all tasks, but when it's time to use the power of agents,
18:29CTAthey will be there and you'll be able to use these swarms for much more complex matters. So hopefully, breaks down and demystifies all the power and all the potential that you might be leaving on the table by not using these patterns in your day to day.
18:41CTALike I said, you'll find all the prompts I showed you along with some additional goodies like an example of that skill folder with the workflow inside of it in the second link down below so you can use it, get started, and start leveling up your workflows like I promised. If you wanna take your AgenTic workflows and AgenTic OS systems to the next level and you wanna master Cloud Code, then make sure you check out the first link down below for my early adopters community.
19:05CTAThis week's upcoming live module will be a deeper dive on dynamic workflows and how I've used it myself for things like travel and personal stuff outside of core business use cases. And for the rest of you, if you found this helpful and novel, I'd super appreciate a like on the video and a comment if you so choose, and I'll see you in the next one.
— full transcript
§ 05 · For Joe

Six shapes every complex agent task can take.

WHAT TO LEARN

Before reaching for a multi-agent workflow, name which failure mode you are solving: that single diagnosis tells you which of the six patterns to use.

  • Single-session Claude degrades on long tasks in three predictable ways: it stops early, grades its own work too charitably, and forgets the original brief after many compactions.
  • Self-preference is the subtlest failure mode: any output-producing session will evaluate its own work more favorably than an independent reviewer would, regardless of quality.
  • Classify and act is the right pattern when incoming tasks are heterogeneous and need to be routed to different specialists before any action is taken.
  • Fan out and synthesize works when sub-problems are truly independent: each agent gets a clean context and returns a structured result with source citations, then a merge step reconciles them.
  • Adversarial verification requires the verifying agent to be a different agent from the one that produced the output, or you have replicated the self-preference problem in a new wrapper.
  • Generate and filter is most useful wherever taste or judgment is required: the more options you generate, the better the top few you can extract.
  • The tournament pattern is the cleanest way to rank a large set of items because each head-to-head comparison is isolated and auditable, reducing context-window bias.
  • Loop until done should always be paired with an outcome-based exit condition and an explicit token budget cap; without both, it can run indefinitely.
  • Stacking patterns is where compound reliability comes from: fan out to gather, adversarially verify to prune, loop until done to confirm, all in a single prompt.
  • Workflows are expensive by design; the honest filter is whether a single capable agent would genuinely fail the task, not whether a workflow would feel more thorough.
§ 06 · Frame Gallery

Visual moments.