WEBVTT

00:00:00.000 --> 00:00:24.955
Claude code just officially dropped its best feature yesterday, workflows. While Opus certainly took center stage, I think this is the most valuable part of yesterday's announcements. In this video, we're gonna explore Claude workflows, what they are, how to use them, and whether or not your family will be without food this month so that you can pay for token usage. Let's get into it. So we need to start this video with just a little bit of context. First, around what a sub agent actually is because that forms the biggest part of what workflows are.

00:00:25.275 --> 00:00:41.850
So when you're using Claude normally in your conversation, you might have a main window. That main window, most people just keep chatting until it's honestly exhausted and has no more context that it could possibly work with. That is part of the problem that we try and solve with sub agents because that context window gets so full. We're running tools, MCP,

00:00:42.010 --> 00:00:49.745
we have long reasoning, we have really long conversations that carry on forever and ever, and all of that just fills up our context window unnecessarily.

00:00:49.745 --> 00:01:33.730
And despite having a 1,000,000 context ceiling, when you have all of this junk and unnecessary information in there, it's really just bloat. So if we could get rid of that bloat, it would be really helpful for the conversation that we're having in this main window. And that's one of the reasons that we would actually have sub agents. So again, if we look at our main Claude code session over here, what we can do is we can get it to spawn a sub agent, which is essentially just a fresh Claude code session with its own isolated context window to perform a specific task. So instead of bloating our main Claude code session with all of those tools and all of those requests that we asked, we have different sub agents to go and perform those requests for us and only return the information that we need. So instead of our conversation being 6,000 tokens, the 6,000 tokens or 60,000

00:01:33.730 --> 00:01:48.175
tokens get done in our sub agent workflow window, and it only pushes back the answer that we need, is roughly about 500 tokens into our main session. So we don't have any of that bloat. We don't have any of those problems that we had before. And there's no need for Claw to compact its main session,

00:01:48.335 --> 00:02:16.635
and none of the context that we have gets filled with things that we don't really need. So we've always been able to do this, and we can do it in many different ways, and there's skill training that can form a part of this amongst other things. I have a separate video on that, so I'm not gonna cover those concepts here. But understanding this is important because workflows directly affects our sub agent orchestration layer. Currently, Claude is our main orchestrator window. So we have that main chat window, and let's just say we wanted to go and call six sub agents. That's perfectly fine. Most of the time, it's accurate. But at scale, that can start to become problematic because Claude

00:02:16.635 --> 00:03:18.940
is still orchestrating all of these agents. It's gotta keep track of what they're doing. It has to manage requests, who runs next, what are they gonna be running, any reasoning involved with that before it gets sent down to the sub agent layer. It also has to manage the results. So it can get quite a lot to manage, especially when we start to look at scale. Essentially, have our manager over here who loses track of everything because of the things that I just mentioned, especially the fact that it has to hold intermediate state within its context. But the new solution here is to move the manager over to a script, So we no longer have this overburdened main context window. We have a workflow dot j s script that holds the state inside variables. It has deterministic loops, and only the final answers return back to our main context window where our thought chat was. One of the great things about this is the fact that it can scale massively. We'll get into that in just a little bit. But because we're no longer burdened by this manager taking on everything and it's all done programmatically or deterministically over here inside our script, it makes it a lot easier to manage this at scale. In terms of what happens at runtime, we have our little script that I just spoke about, the workflow dot j s, and that runs as a separate process at runtime.

00:03:19.100 --> 00:04:52.175
So you can see over here, we've got our process. We have agent one, agent two, agent three, and all it's doing is loading our JavaScript file, but then executes whatever is inside that JavaScript file, and then it spawns our little sub agents over here to go and do the work outlined in that JavaScript file. It then also has something called a journal, and that essentially manages the state of whatever's happening here. So you're actually able to resume any work. You can pause this and then come back to it later because of this journal that sits in between it. Obviously, depending on how you're running your workflow, that will affect whether you actually get to pause and resume this. For this to functionally work, you either need to be using Claude code in the desktop app or in the IDE itself by the terminal. Currently, can't do this in Versus Code with the extension. Three quick things to note on this as well. There is no direct file system or shell access from the script. The agents can do that, obviously. Currently, 16 concurrent agents is the max amount that you can have. There are various other technical factors that go into this that I'm not gonna put in this video. I will have a separate deep dive video where I dive into all of this stuff at a very low level. You can then have a thousand agents total per run, so you can still have a massive swarm doing the work. It's mostly just the concurrent agents max that you're limited to right now. Okay. But enough talking. Let's get into some practical stuff. So I have run two previous searches that we'll take a look into in just a bit. What I wanna do here is to show you how this works, and we'll dive into some of the output once it's done. In terms of invoking a workflow, every time you use the word workflow when you're having a chat with Claude now, it actually pops up as a command that we can then use and it will turn whatever you're talking about into a workflow. So just keep that in mind. But for this, I want to focus on deep research, which is a skill that they just brought out using this workflow's functionality to do exactly what it sounds like, some very deep research.

00:04:52.735 --> 00:04:56.815
Can you do some deep research into the benefits of vitamin c for the human body?

00:04:57.455 --> 00:05:15.340
So it now should invoke this skill because it picks up that we asked for deep research, and we should get a little box that pops up warning us about the cost of running a workflow. But not for me because I YOLO everything and I probably clicked always allow the last time I ran this thing. But you can see over here that we have our deep research running as a background task,

00:05:15.905 --> 00:05:22.465
and it runs through five stages. So the first one is the scope. It breaks down vitamin c question into five angles that it needs to search.

00:05:22.705 --> 00:05:31.800
We have five parallel web searches, one per angle. Then it fetches, so it dedups any URLs. It pulls the top 15 sources and extracts any falsifiable claims.

00:05:31.960 --> 00:05:52.095
It then verifies it. And this is the important part. This uses a ton of agents because it's adversarial three vote fact checking on each claim. It needs to get two out of three refutes killed before it's going to synthesize any information, which is the last part over here. So you can see how this cost can rack up. Currently, we have 22 agents in climbing. We've already burned through over 550,000

00:05:52.095 --> 00:05:52.895
tokens,

00:05:53.055 --> 00:06:14.995
and it's only been running for one minute. Something to note here, yes, this is still gonna count against your usage. It's not like just because we've isolated this to separate sub agents that we're magically gonna get all this perfect usage. We are still using this much, and I imagine this will go well over a million tokens. You can check the phases in here. So we have our scope. We can see the agent that ran 31,000 tokens for that, and it used one tool and took seventeen seconds.

00:06:15.235 --> 00:06:29.780
We can see in the fetch phase all of the agents that are currently running, how many tokens they're using, what tools they're using, and how long they've been running for. While that thing is cooking, I'll come back over to a workflow that I ran earlier. This is where I asked it to just come up with its own choice of showing off its capabilities

00:06:30.100 --> 00:06:33.780
for this new workflow thing. So for those of you who are thinking that this is only for software developers,

00:06:34.195 --> 00:07:06.595
not really because apart from the research that we just ran, you can also have it do various other things. For this example that it created by itself, it created a startup forge, which is a self contained demo workflow running in the background now, and it's showcasing that four agents each invent a startup from a different angle, the consumer, b two b, the climate, and an AI native business. Each idea is scored by a VC judge the moment it's ready. Idea number one gets judged while idea number four is still being written. Then it has a judge, and judges are forced to return validated novelty market feasibility and total information

00:07:06.755 --> 00:07:14.195
so we can understand whether our idea is actually any good. We then stress test it. So the top ranked idea is attacked by three skeptics in parallel,

00:07:14.435 --> 00:07:52.830
each with a distinct lens, which is really important when you're doing this and it's actually just part of persona prompting, which is something I mentioned last week in another video, because each person here from their own lens is trying to find a fatal flaw in this idea that we're about to put forward. And then finally, in its synthesized phase, it's going to write an honest investor pitch that must confront every objection head on. So its use cases go well beyond just doing software development like PR requests, code based trolling, code reviews, things like that. You can use this for so many ideas or business ideas that you might have, and an audit might be another one. There There are several order types where this type of thing would be really valuable having all these agents go out there. But like I say, just because you can doesn't mean that you should, and realistically,

00:07:52.830 --> 00:08:10.485
you've already seen that the tokens rack up really, really fast. So using this for everything like an idea is probably not gonna be the best approach unless you're really looking for that million dollar idea that you wanna make sure you actually progress with, then I definitely think this could be worthwhile. Then if we scroll down a little bit, you'll see that it saved that script that I spoke about earlier, that workflow.js.

00:08:10.645 --> 00:08:18.520
So in this case, it created startup forge with a bunch of numbers dot j s, and it's offering to tweak it for us. So like I said, this this file might be static,

00:08:18.680 --> 00:09:51.560
but you can edit it. If you're technically inclined, you can go in there and add whatever you want to this thing. You don't have to be technically inclined because Claude will obviously do it for you based on your natural language. And it's blatantly asking me here, do I want to tweak it in any way? Bump up the idea count, add a loop until no fatal objections refinement loop, or swap the domain that we're in. We could make very specific angles here if we wanted to. You can then click on the script and take a look at it. Most of it is in pretty clear English over here. So we have the name, startup forge, the description of what it's doing, when to use. In this case, it's using it specifically for a demo of fan art, pipeline, judge panel, and adversarial verification patterns. So if I ever wanted to rerun this test, it would be saved and I would be able to rerun it based on the conditions that we put inside the description over here. And for those of you wondering, yes, you can use a different model in here. You wouldn't have to use Opus or Sonnet for any of this stuff. You could tell it to use Haiku as a part of your initial request when you're starting to talk to it here or again, you could come into the script and edit it yourself if you wanted to. But why do that? Just have your requirements ready upfront. Make sure you come to it with a very specific request here. I think that's how you're gonna get the most out of using this. The more clarity that we provide the system upfront, the better the output is that we're going to have down the line. So I've just asked Claude to do that exactly right now because you can use different phases inside the script. So some of them could run-in haiku, maybe for discovery if that's what you wanted to use them for, and then Sonnet could do a creative writing part, whatever. There are different aspects that we can take in here. So what I've asked it to do now is just showcase more of its functionality inside JavaScript so that we can have a look at it. And this is what it's come up with. So the first one for generate, six haiku agents brainstorm name and tagline candidates.

00:09:51.945 --> 00:10:19.330
Then we have critique where Sonnet scores each candidate for pipeline, no barrier, and then we synthesize the information with Opus. It writes the final brand brief from the winner. Of course, you might not want to use those things in that specific order. It might be better to use the generate phase with Sonnet or something like that. That's obviously up to your workflow. This is just for demo purposes here. I'm gonna allow that and then we'll see how the JavaScript file changes. While that's cooking, we're gonna flip on back to our vitamin c benefits over here, and it's been running for twelve minutes and used 3,100,000

00:10:19.330 --> 00:10:20.050
tokens.

00:10:20.210 --> 00:10:37.930
So you can see how this can become problematic for you. Because it's using that deterministic loop, it is going to run until it achieves its goal. There is a boundary that is set in, you can obviously change this boundary to make sure that it doesn't just get stuck in this death spiral and burn through all of your tokens in a single day. Right now, we've got a 105 agents with 3,000,000.

00:10:38.010 --> 00:11:29.295
Currently synthesizing, so this should be the last leg of the work that it's doing. I also think it's clear at this point that you wouldn't wanna use this for every bit of research that you're doing. That would be ridiculous. This is for when you are trying to figure out a very, very specific problem. For instance, if you're keeping track of your health, you know, on the whole vitamin c thing here, let's say you got some blood work back and you were trying to understand discrepancies between things and the doctor gave you some bullshit answer as many of them tend to do, you could chuck your results in here and get it to fan out and do a ridiculous amount of research from every different angle out there to figure out why these discrepancies are, what they could mean, things like that. That's kind of where I would put this deep research workflow, maybe into competitor research if I was doing something that really needed it. Again, a very specific use case. Then coming back to our other workflow, if we have a look at the JavaScript file for that, can see where it's configured things. So for our phases over here, have the generate phase, six Haiku agents brainstorm name and tagline candidates,

00:11:29.375 --> 00:11:42.540
model Haiku, Then we have the critique and we have model sonnet. And for the synthesis, we have model opus. That top part over there was like it said just for display purposes. If we scroll down into our script over here, we can see where it's invoking the agent

00:11:42.625 --> 00:12:03.290
that it's using model Haiku. And if we do control f, we could probably see the same thing with Opus. Over here on line 75 model Opus. So it has chunked everything down to specific agents that we needed to use for the specific stages. And you can configure this over and over again until you get to a working point that you actually want. You can see how I've just been talking to Claude. It knows its own capabilities on this. You could obviously get it to do research

00:12:03.450 --> 00:12:38.000
on the docs as well and online to see what other people are doing to see what you could build for the workflows that you might wanna use this for. You can also blatantly ask it if the thing that you're trying to use it for is a good use case. One of my use cases is for lead gen. So instead of having to go through 500 agents using skill training, I can now just use this. Obviously, I'm still hit by the 16 concurrent runs, but it's much faster than my old skill training methodology. Still in testing phase though, so I'll get those results back in another video to see if that is a genuine use case. But we could pop back over to here and see that this thing is now done. We should now be vitamin c experts after a 105 agents and 3,000,000

00:12:38.000 --> 00:13:41.125
and fifteen minutes. It didn't actually give us as much information as one would think after that amount of time and that many agents. So why don't we ask it? Why did you use so many agents in that run? What were the most agents used on? I'm guessing the answer is going to be from verifying the actual claims, but we'll have a breakdown of that now. You can obviously see that on the right hand side here as well. Scope used one, then we went up, and then slowly for fetch and verify, that's where we use most of them. Yep. Around 75 for verify, the big one. 25 top claims times three independent verify agents each. Like I said, this is probably gonna be your biggest one most of the time that you're running workflows. So flipping back to our slides, we now know with a level one prompt, we can actually design this quite well from the start and that's why I said do some research upfront. Make sure that you come in there being very specific. It's just like you should be using AI every single day. The more specific you are, the better output that you're gonna get. Once we do that, we do have some control after that. We can inspect the file before we run it whether you're in Claude code or in the terminal. It will always pop up and say, hey, we're gonna go and do this thing. Do you wanna do that? And then you can go and look inside the file before it actually runs, where you can change the agent calls, how many agents run-in parallel versus the pipeline,

00:13:41.285 --> 00:13:49.285
each agents prompt, the model per stage, any budget cards that you wanna put in there. By default, this thing is gonna be running with edit accept permissions.

00:13:49.285 --> 00:14:12.205
So just note it will obviously be able to do what it needs to do at a high level, but it's not gonna bypass permissions unless you tell it to. Level three here, you can edit the file like you saw. You can literally go into the JS file and edit it however you want to. We've already spoken a little bit about triggers during our practical workflow. There is effort ultra code in which case it will go into an auto workflow for every substantive task. I wouldn't necessarily take this approach though. You then also have your saved workflows.

00:14:12.285 --> 00:15:17.380
Those will be saved inside Claude code, but also if you're in Versus Code, workflow still works perfectly fine. Like I can come in here, I've had discussions with this thing historically, and then if I want to I can save it. I can't just do s or automatically have it saved, but I can ask it to save whatever it did and we can then store it inside here and you'll see that I have the exact same JavaScript file as we were looking at in Claude code. For Macs and team users, this thing is automatically switched on. For those of you on Pro, it's not automatically switched on for obvious reasons. It will probably nuke your entire budget, and you won't be able to feed your family for a week because of extra usage credits. Turning it on and off, can obviously do that directly doing forward config. You can edit your settings directly. The usual suspects for getting this to work. For any organizations, this is off by default. Again, for obvious reasons, you can turn it on by requesting it from your admin. Deep research is also not gonna work if you switch this thing off, so keep that in mind. Then just to wrap up here, when should you actually reach for this kind of thing? I've touched on that a little bit during our prep, but just to bring it home over here, I would mainly say I'm gonna be using skills for the majority of my workflows. Again, in a business, you just want that determinism and that reliability. You wanna make sure that Claude is doing the same thing every day, getting leads, replying to people, what whatever it is your business needs.

00:15:17.620 --> 00:15:50.300
There are specific use cases for running a workflow. It's not something you're going to be running every single day because you've seen how many tokens they use. But you wanna use a workflow when a task fans out across many similar items, or if you want those deterministic loops. So you can think about it like relentlessly trying to achieve a goal that can be quite complex. You saw that thing was just gonna keep working until it's finished whatever was in that script that's managing all of those agents. It's also pretty good if you want resumability mid run, so you can pause this and come back to it later because it is controlling that state to some degree. But realistically, the biggest benefits obviously come down to developers,

00:15:50.460 --> 00:16:15.844
people building actual products because you can do massive bug sweeps, and all things related software development at scale with this type of thing, as well as research and whatever else goes along with that. But again, we are just gonna be constrained by the sheer amount of tokens that this thing is currently using. But other than that, this thing is still in research preview, so keep that in mind as well. I hope this video was helpful. If you have any questions, leave them down below, I'll get back to you as soon as possible. Otherwise, check out the videos on the screen now. They'll definitely help you on your journey. Thanks very much for watching. See you guys in the next one.