Mansel Scheffel · Youtube · 16:16

Claude Code Just Dropped Workflows (An Actual Game Changer)

A sixteen-minute walkthrough of Claude Code's Dynamic Workflows, where a deterministic script takes over orchestration that the chat window used to carry.

Posted
May 29th 2026
5 days ago
Duration
16:16
Format
Tutorial
educational
Channel
MS
Mansel Scheffel
§ 01 · The Hook

The bait, then the rug-pull.

The same day Opus 4.8 took the spotlight, Claude Code quietly shipped the thing this video argues actually matters: Dynamic Workflows. The pitch is simple and the payoff is real, but so is the bill, and the host shows both without flinching.

§ · Chapters

Where the time goes.

00:00 – 00:17

01 · Intro

Workflows, not Opus, were the most valuable announcement; what the video will cover.

00:17 – 04:22

02 · How Workflows Actually Work

Subagent recap, why Claude-as-orchestrator breaks at scale, the shift to a workflow.js manager, runtime, journal, and hard limits (16 concurrent, 1000 total, no shell from the script).

04:22 – 09:00

03 · Live Demo: deep-research + Startup Forge

Runs /deep-research on vitamin C through its five phases, and a Claude-invented Startup Forge workflow that ideates, judges, stress-tests, and pitches.

09:00 – 14:59

04 · Inside The Script + How To Control It

Model-per-phase shown in the .js, editing the script, the deep-research run finishing at 105 agents and 3 million tokens, and where most agents went.

14:59 – 16:16

05 · When To Actually Use This (And When Not To)

Three control levels, four ways to start and three to turn off, default-on/off by plan, and Anthropic's verbatim criteria for when a workflow beats a skill.

§ · Storyboard

Visual structure at a glance.

open
context ceiling
Claude as orchestrator
the shift
runtime + journal
deep-research demo
startup forge
inside the script
105 agents, 3M tokens
three control levels
triggers + off switches
when to reach for it
§ · Frameworks

Named ideas worth stealing.

05:15 list

The five-phase deep-research pipeline

  1. Scope: break the question into search angles
  2. Search: parallel web searches, one per angle
  3. Fetch: dedupe URLs, pull top sources, extract falsifiable claims
  4. Verify: adversarial three-vote fact-checking, two of three refutes kills a claim
  5. Synthesize: merge duplicates, rank by confidence, write a cited report

Anthropic's pre-built /deep-research workflow runs these five stages; verification consumes the most agents and tokens.

Steal for any research or due-diligence task that needs claims fact-checked rather than just summarized
13:42 model

Three levels of control

  1. Level 1: Prompt, steer the design in plain English before the script is written
  2. Level 2: Inspect, open the generated .js with Ctrl+G and verify agents, models, budgets, and permissions before running
  3. Level 3: Edit the file, change prompts, models, parallelism, branching, and guards directly

Maps how much you intervene to how much determinism you get, from a one-line prompt to hand-editing the orchestration script.

Steal for explaining any AI feature that scales from natural-language control to full code control
15:05 model

Use a workflow when (vs. use a skill or just chat)

  1. Use a workflow when the task fans out across many similar items
  2. Use a workflow when you want deterministic loops and conditionals
  3. Use a workflow when you want resumability mid-run
  4. Use a workflow when the orchestration itself is repeatable
  5. Otherwise use a skill or chat: turn-by-turn judgment is the value, a single conversation handles it, you want repeatable instructions, or it is a one-off

Anthropic's verbatim decision criteria for reaching for a workflow instead of a skill or plain chat.

Steal for deciding whether any automation should be a rigid pipeline or a flexible agent
11:29 concept

Model-per-phase assignment

  1. Haiku for generate / brainstorm
  2. Sonnet for critique / scoring
  3. Opus for synthesis

Each workflow phase can run on a different model, set in the prompt up front or by editing model: in the script, so cheap models do the high-volume work and expensive ones do the final reasoning.

Steal for controlling cost on any multi-step AI pipeline by matching model tier to task difficulty
§ · Quotables

Lines you could clip.

00:11
"While Opus certainly took center stage, I think this is the most valuable part of yesterday's announcements."
contrarian framing that the overlooked feature beats the headline model → TikTok hook
03:19
"The new solution here is to move the manager over to a script, so we no longer have this overburdened main context window."
one-sentence statement of the whole feature → IG reel cold open
07:50
"Just because you can doesn't mean that you should, and you've already seen that the tokens rack up really, really fast."
the honest cost warning that cuts through the hype → newsletter pull-quote
12:38
"We should now be vitamin C experts after a hundred and five agents and three million tokens, and it didn't actually give us as much information as one would think."
candid anti-hype payoff with a concrete, shocking number → TikTok hook
15:20
"You want to use a workflow when a task fans out across many similar items, or if you want those deterministic loops."
the practical decision rule in one line → newsletter pull-quote
§ · Resources Mentioned

Things they pointed at.

05:05tool/deep-research skill (Anthropic pre-built workflow)
09:00productClaude Code Dynamic Workflows
00:00productOpus 4.8
§ · CTA Breakdown

How they asked for the click.

16:00 next-video
"I hope this video was helpful. If you have any questions, leave them down below. Otherwise, check out the videos on the screen now."

soft end-screen pointer to related videos plus a comment prompt; no hard product pitch in-video, the offers live in the description

§ 04 · The Script

Word for word.

HOOK opening / re-engagementCTA the pitch metaphor story
00:00HOOKClaude code just officially dropped its best feature yesterday, workflows. While Opus certainly took center stage, I think this is the most valuable part of yesterday's announcements. In this video, we're gonna explore Claude workflows, what they are, how to use them, and whether or not your family will be without food this month so that you can pay for token usage.
00:16HOOKLet's get into it. So we need to start this video with just a little bit of context. First, around what a sub agent actually is because that forms the biggest part of what workflows are.
00:25So when you're using Claude normally in your conversation, you might have a main window. That main window, most people just keep chatting until it's honestly exhausted and has no more context that it could possibly work with. That is part of the problem that we try and solve with sub agents because that context window gets so full.
00:40We're running tools, MCP, we have long reasoning, we have really long conversations that carry on forever and ever, and all of that just fills up our context window unnecessarily. And despite having a 1,000,000 context ceiling, when you have all of this junk and unnecessary information in there, it's really just bloat.
00:56So if we could get rid of that bloat, it would be really helpful for the conversation that we're having in this main window. And that's one of the reasons that we would actually have sub agents. So again, if we look at our main Claude code session over here, what we can do is we can get it to spawn a sub agent, which is essentially just a fresh Claude code session with its own isolated context window to perform a specific task.
01:16So instead of bloating our main Claude code session with all of those tools and all of those requests that we asked, we have different sub agents to go and perform those requests for us and only return the information that we need. So instead of our conversation being 6,000 tokens, the 6,000 tokens or 60,000 tokens get done in our sub agent workflow window, and it only pushes back the answer that we need, is roughly about 500 tokens into our main session.
01:42So we don't have any of that bloat. We don't have any of those problems that we had before. And there's no need for Claw to compact its main session,
01:48and none of the context that we have gets filled with things that we don't really need. So we've always been able to do this, and we can do it in many different ways, and there's skill training that can form a part of this amongst other things. I have a separate video on that, so I'm not gonna cover those concepts here.
01:59But understanding this is important because workflows directly affects our sub agent orchestration layer. Currently, Claude is our main orchestrator window. So we have that main chat window, and let's just say we wanted to go and call six sub agents.
02:11That's perfectly fine. Most of the time, it's accurate. But at scale, that can start to become problematic because Claude
02:16is still orchestrating all of these agents. It's gotta keep track of what they're doing. It has to manage requests, who runs next, what are they gonna be running, any reasoning involved with that before it gets sent down to the sub agent layer.
02:26It also has to manage the results. So it can get quite a lot to manage, especially when we start to look at scale. Essentially, have our manager over here who loses track of everything because of the things that I just mentioned, especially the fact that it has to hold intermediate state within its context.
02:40But the new solution here is to move the manager over to a script, So we no longer have this overburdened main context window. We have a workflow dot j s script that holds the state inside variables. It has deterministic loops, and only the final answers return back to our main context window where our thought chat was.
02:57One of the great things about this is the fact that it can scale massively. We'll get into that in just a little bit. But because we're no longer burdened by this manager taking on everything and it's all done programmatically or deterministically over here inside our script, it makes it a lot easier to manage this at scale.
03:11In terms of what happens at runtime, we have our little script that I just spoke about, the workflow dot j s, and that runs as a separate process at runtime. So you can see over here, we've got our process. We have agent one, agent two, agent three, and all it's doing is loading our JavaScript file, but then executes whatever is inside that JavaScript file, and then it spawns our little sub agents over here to go and do the work outlined in that JavaScript file.
03:32It then also has something called a journal, and that essentially manages the state of whatever's happening here. So you're actually able to resume any work. You can pause this and then come back to it later because of this journal that sits in between it.
03:43Obviously, depending on how you're running your workflow, that will affect whether you actually get to pause and resume this. For this to functionally work, you either need to be using Claude code in the desktop app or in the IDE itself by the terminal. Currently, can't do this in Versus Code with the extension.
03:56Three quick things to note on this as well. There is no direct file system or shell access from the script. The agents can do that, obviously.
04:02Currently, 16 concurrent agents is the max amount that you can have. There are various other technical factors that go into this that I'm not gonna put in this video. I will have a separate deep dive video where I dive into all of this stuff at a very low level.
04:13HOOKYou can then have a thousand agents total per run, so you can still have a massive swarm doing the work. It's mostly just the concurrent agents max that you're limited to right now. Okay.
04:22HOOKBut enough talking. Let's get into some practical stuff. So I have run two previous searches that we'll take a look into in just a bit.
04:28HOOKWhat I wanna do here is to show you how this works, and we'll dive into some of the output once it's done. In terms of invoking a workflow, every time you use the word workflow when you're having a chat with Claude now, it actually pops up as a command that we can then use and it will turn whatever you're talking about into a workflow.
04:42HOOKSo just keep that in mind. But for this, I want to focus on deep research, which is a skill that they just brought out using this workflow's functionality to do exactly what it sounds like, some very deep research. Can you do some deep research into the benefits of vitamin c for the human body?
04:57So it now should invoke this skill because it picks up that we asked for deep research, and we should get a little box that pops up warning us about the cost of running a workflow. But not for me because I YOLO everything and I probably clicked always allow the last time I ran this thing. But you can see over here that we have our deep research running as a background task,
05:15and it runs through five stages. So the first one is the scope. It breaks down vitamin c question into five angles that it needs to search.
05:22We have five parallel web searches, one per angle. Then it fetches, so it dedups any URLs. It pulls the top 15 sources and extracts any falsifiable claims.
05:31It then verifies it. And this is the important part. This uses a ton of agents because it's adversarial three vote fact checking on each claim.
05:39It needs to get two out of three refutes killed before it's going to synthesize any information, which is the last part over here. So you can see how this cost can rack up. Currently, we have 22 agents in climbing.
05:49We've already burned through over 550,000 tokens, and it's only been running for one minute.
05:54Something to note here, yes, this is still gonna count against your usage. It's not like just because we've isolated this to separate sub agents that we're magically gonna get all this perfect usage. We are still using this much, and I imagine this will go well over a million tokens.
06:07You can check the phases in here. So we have our scope. We can see the agent that ran 31,000 tokens for that, and it used one tool and took seventeen seconds.
06:15We can see in the fetch phase all of the agents that are currently running, how many tokens they're using, what tools they're using, and how long they've been running for. While that thing is cooking, I'll come back over to a workflow that I ran earlier. This is where I asked it to just come up with its own choice of showing off its capabilities
06:30for this new workflow thing. So for those of you who are thinking that this is only for software developers, not really because apart from the research that we just ran, you can also have it do various other things.
06:38For this example that it created by itself, it created a startup forge, which is a self contained demo workflow running in the background now, and it's showcasing that four agents each invent a startup from a different angle, the consumer, b two b, the climate, and an AI native business. Each idea is scored by a VC judge the moment it's ready.
06:57Idea number one gets judged while idea number four is still being written. Then it has a judge, and judges are forced to return validated novelty market feasibility and total information so we can understand whether our idea is actually any good.
07:09We then stress test it. So the top ranked idea is attacked by three skeptics in parallel, each with a distinct lens, which is really important when you're doing this and it's actually just part of persona prompting, which is something I mentioned last week in another video, because each person here from their own lens is trying to find a fatal flaw in this idea that we're about to put forward.
07:28And then finally, in its synthesized phase, it's going to write an honest investor pitch that must confront every objection head on. So its use cases go well beyond just doing software development like PR requests, code based trolling, code reviews, things like that. You can use this for so many ideas or business ideas that you might have, and an audit might be another one.
07:44There There are several order types where this type of thing would be really valuable having all these agents go out there. But like I say, just because you can doesn't mean that you should, and realistically, you've already seen that the tokens rack up really, really fast.
07:55So using this for everything like an idea is probably not gonna be the best approach unless you're really looking for that million dollar idea that you wanna make sure you actually progress with, then I definitely think this could be worthwhile. Then if we scroll down a little bit, you'll see that it saved that script that I spoke about earlier, that workflow.js.
08:10So in this case, it created startup forge with a bunch of numbers dot j s, and it's offering to tweak it for us. So like I said, this this file might be static, but you can edit it.
08:19If you're technically inclined, you can go in there and add whatever you want to this thing. You don't have to be technically inclined because Claude will obviously do it for you based on your natural language. And it's blatantly asking me here, do I want to tweak it in any way?
08:30Bump up the idea count, add a loop until no fatal objections refinement loop, or swap the domain that we're in. We could make very specific angles here if we wanted to. You can then click on the script and take a look at it.
08:41Most of it is in pretty clear English over here. So we have the name, startup forge, the description of what it's doing, when to use. In this case, it's using it specifically for a demo of fan art, pipeline, judge panel, and adversarial verification patterns.
08:53So if I ever wanted to rerun this test, it would be saved and I would be able to rerun it based on the conditions that we put inside the description over here. And for those of you wondering, yes, you can use a different model in here. You wouldn't have to use Opus or Sonnet for any of this stuff.
09:06You could tell it to use Haiku as a part of your initial request when you're starting to talk to it here or again, you could come into the script and edit it yourself if you wanted to. But why do that? Just have your requirements ready upfront.
09:17Make sure you come to it with a very specific request here. I think that's how you're gonna get the most out of using this. The more clarity that we provide the system upfront, the better the output is that we're going to have down the line.
09:26So I've just asked Claude to do that exactly right now because you can use different phases inside the script. So some of them could run-in haiku, maybe for discovery if that's what you wanted to use them for, and then Sonnet could do a creative writing part, whatever. There are different aspects that we can take in here.
09:40So what I've asked it to do now is just showcase more of its functionality inside JavaScript so that we can have a look at it. And this is what it's come up with. So the first one for generate, six haiku agents brainstorm name and tagline candidates.
09:51Then we have critique where Sonnet scores each candidate for pipeline, no barrier, and then we synthesize the information with Opus. It writes the final brand brief from the winner. Of course, you might not want to use those things in that specific order.
10:03It might be better to use the generate phase with Sonnet or something like that. That's obviously up to your workflow. This is just for demo purposes here.
10:09I'm gonna allow that and then we'll see how the JavaScript file changes. While that's cooking, we're gonna flip on back to our vitamin c benefits over here, and it's been running for twelve minutes and used 3,100,000 tokens.
10:20So you can see how this can become problematic for you. Because it's using that deterministic loop, it is going to run until it achieves its goal. There is a boundary that is set in, you can obviously change this boundary to make sure that it doesn't just get stuck in this death spiral and burn through all of your tokens in a single day.
10:34Right now, we've got a 105 agents with 3,000,000. Currently synthesizing, so this should be the last leg of the work that it's doing. I also think it's clear at this point that you wouldn't wanna use this for every bit of research that you're doing.
10:45That would be ridiculous. This is for when you are trying to figure out a very, very specific problem. For instance, if you're keeping track of your health, you know, on the whole vitamin c thing here, let's say you got some blood work back and you were trying to understand discrepancies between things and the doctor gave you some bullshit answer as many of them tend to do, you could chuck your results in here and get it to fan out and do a ridiculous amount of research from every different angle out there to figure out why these discrepancies are, what they could mean, things like that.
11:09That's kind of where I would put this deep research workflow, maybe into competitor research if I was doing something that really needed it. Again, a very specific use case. Then coming back to our other workflow, if we have a look at the JavaScript file for that, can see where it's configured things.
11:23So for our phases over here, have the generate phase, six Haiku agents brainstorm name and tagline candidates, model Haiku, Then we have the critique and we have model sonnet. And for the synthesis, we have model opus.
11:35That top part over there was like it said just for display purposes. If we scroll down into our script over here, we can see where it's invoking the agent that it's using model Haiku.
11:44And if we do control f, we could probably see the same thing with Opus. Over here on line 75 model Opus. So it has chunked everything down to specific agents that we needed to use for the specific stages.
11:54And you can configure this over and over again until you get to a working point that you actually want. You can see how I've just been talking to Claude. It knows its own capabilities on this.
12:01You could obviously get it to do research on the docs as well and online to see what other people are doing to see what you could build for the workflows that you might wanna use this for. You can also blatantly ask it if the thing that you're trying to use it for is a good use case.
12:14One of my use cases is for lead gen. So instead of having to go through 500 agents using skill training, I can now just use this. Obviously, I'm still hit by the 16 concurrent runs, but it's much faster than my old skill training methodology.
12:25Still in testing phase though, so I'll get those results back in another video to see if that is a genuine use case. But we could pop back over to here and see that this thing is now done. We should now be vitamin c experts after a 105 agents and 3,000,000
12:38and fifteen minutes. It didn't actually give us as much information as one would think after that amount of time and that many agents. So why don't we ask it?
12:44Why did you use so many agents in that run? What were the most agents used on? I'm guessing the answer is going to be from verifying the actual claims, but we'll have a breakdown of that now.
12:54You can obviously see that on the right hand side here as well. Scope used one, then we went up, and then slowly for fetch and verify, that's where we use most of them. Yep.
13:01Around 75 for verify, the big one. 25 top claims times three independent verify agents each. Like I said, this is probably gonna be your biggest one most of the time that you're running workflows.
13:10So flipping back to our slides, we now know with a level one prompt, we can actually design this quite well from the start and that's why I said do some research upfront. Make sure that you come in there being very specific. It's just like you should be using AI every single day.
13:22The more specific you are, the better output that you're gonna get. Once we do that, we do have some control after that. We can inspect the file before we run it whether you're in Claude code or in the terminal.
13:30It will always pop up and say, hey, we're gonna go and do this thing. Do you wanna do that? And then you can go and look inside the file before it actually runs, where you can change the agent calls, how many agents run-in parallel versus the pipeline,
13:41each agents prompt, the model per stage, any budget cards that you wanna put in there. By default, this thing is gonna be running with edit accept permissions. So just note it will obviously be able to do what it needs to do at a high level, but it's not gonna bypass permissions unless you tell it to.
13:54Level three here, you can edit the file like you saw. You can literally go into the JS file and edit it however you want to. We've already spoken a little bit about triggers during our practical workflow.
14:03There is effort ultra code in which case it will go into an auto workflow for every substantive task. I wouldn't necessarily take this approach though. You then also have your saved workflows.
14:12Those will be saved inside Claude code, but also if you're in Versus Code, workflow still works perfectly fine. Like I can come in here, I've had discussions with this thing historically, and then if I want to I can save it. I can't just do s or automatically have it saved, but I can ask it to save whatever it did and we can then store it inside here and you'll see that I have the exact same JavaScript file as we were looking at in Claude code.
14:32For Macs and team users, this thing is automatically switched on. For those of you on Pro, it's not automatically switched on for obvious reasons. It will probably nuke your entire budget, and you won't be able to feed your family for a week because of extra usage credits.
14:43Turning it on and off, can obviously do that directly doing forward config. You can edit your settings directly. The usual suspects for getting this to work.
14:49For any organizations, this is off by default. Again, for obvious reasons, you can turn it on by requesting it from your admin. Deep research is also not gonna work if you switch this thing off, so keep that in mind.
14:59Then just to wrap up here, when should you actually reach for this kind of thing? I've touched on that a little bit during our prep, but just to bring it home over here, I would mainly say I'm gonna be using skills for the majority of my workflows. Again, in a business, you just want that determinism and that reliability.
15:11You wanna make sure that Claude is doing the same thing every day, getting leads, replying to people, what whatever it is your business needs. There are specific use cases for running a workflow. It's not something you're going to be running every single day because you've seen how many tokens they use.
15:24But you wanna use a workflow when a task fans out across many similar items, or if you want those deterministic loops. So you can think about it like relentlessly trying to achieve a goal that can be quite complex. You saw that thing was just gonna keep working until it's finished whatever was in that script that's managing all of those agents.
15:40It's also pretty good if you want resumability mid run, so you can pause this and come back to it later because it is controlling that state to some degree. But realistically, the biggest benefits obviously come down to developers, people building actual products because you can do massive bug sweeps, and all things related software development at scale with this type of thing, as well as research and whatever else goes along with that.
15:59CTABut again, we are just gonna be constrained by the sheer amount of tokens that this thing is currently using. But other than that, this thing is still in research preview, so keep that in mind as well. I hope this video was helpful.
16:08CTAIf you have any questions, leave them down below, I'll get back to you as soon as possible. Otherwise, check out the videos on the screen now. They'll definitely help you on your journey.
16:13CTAThanks very much for watching. See you guys in the next one.
— full transcript
§ 05 · For Joe

When agent fan-out earns its token bill.

WHAT TO LEARN

Workflows are worth reaching for only when a task fans out across many similar items or needs deterministic, resumable orchestration, because everything else costs less as a skill or plain chat.

  • A subagent runs its expensive work in an isolated context and hands back only the small answer, so the orchestrating session never sees the bloat.
  • The chat window fails as an orchestrator at scale because it must hold every agent's intermediate state, routing, and results in its own limited context.
  • Moving orchestration into a script keeps state in variables and returns only final answers, which is the actual mechanism that lets fan-out scale.
  • A run journal records completed agents, so a workflow can pause and resume with finished work returned from cache rather than rerun from scratch.
  • Concurrency caps at 16 agents while total agents per run reach 1,000, so large swarms still finish but throttle rather than run all at once.
  • Isolating work into agents does not lower the bill: the deep-research demo still burned three million tokens across 105 agents in fifteen minutes.
  • Verification is the cost sink because each top claim gets three independent fact-checkers and is only discarded when two of them refute it.
  • Assign a cheaper model to high-volume phases and an expensive one only to final synthesis to keep multi-agent runs affordable.
  • Reach for a workflow when the work fans out, loops deterministically, needs resumability, or repeats; otherwise a skill or chat is cheaper and steadier.
  • Treat agent count and token spend as the real constraint, not capability, since the limiting factor is cost long before the feature runs out of power.
§ 06 · Frame Gallery

Visual moments.