WEBVTT

00:00:00.000 --> 00:00:09.600
Hello, legends. In this video, I'm gonna show you how to use the new Claude dynamic workflows feature, which lets you generate up to a thousand agents to work on really complicated tasks.

00:00:09.760 --> 00:00:34.910
And instead of us using the paid API from Anthropic or even needing an Anthropic account, I'm gonna show you how to do this by using local AI models that are running completely on your computer. And this is possible because we're using the gateway version for Claude. The gateway version is still an official Claude product. It's literally the Claude desktop app, which we get access to the Claude code and the Claude co work. But by using the gateway, we're able to plug into any LLM.

00:00:34.910 --> 00:00:46.775
So we can either use something like LM Studio, which we're gonna be doing in this video, to download and use local models directly with Claude Cowork and Claude Code, or we can connect up to something like OpenRouter,

00:00:46.855 --> 00:00:55.415
which has got access to hundreds of cloud based models. Some are free. Some are paid. But even the paid ones, you will save, like, nine just over 98%

00:00:55.415 --> 00:01:09.500
to get really, really good models that you can use. To get this working, what we need to do is just read the documentation on a thing called Cowork on three p. Once again, that's just the version of Claw desktop app that lets you plug into a gateway. So we're just gonna go across to this documentation.

00:01:09.580 --> 00:01:48.845
So over here, we can see run Cowork against your own cloud inference provider or, in our case, our own local inference provider. And I'm just gonna go into the next steps to figure out how to install and set this up. So our first step is to download the Claw desktop app. If you don't already have this, just click this button, and then download the desktop app for yourself. Works on a Mac and Windows, so just download and install. Once you're done with that, the step two is, uh, explicitly stated, do not sign in or do not create an Anthropic account because once again, you don't need to have an account or to be using the Cloud API to make this work. And once your app is open on your screen, you just go into the top left hand corner if you're on Mac OS and click on help, drop down to troubleshooting,

00:01:48.845 --> 00:01:56.205
and then, uh, enable developer mode. Once you enable developer mode, in that same top menu bar, you see a new menu button called developer.

00:01:56.550 --> 00:02:00.070
Once you drop that down, you'll see configure third party inference.

00:02:00.310 --> 00:02:07.910
When you open the configure third party inference settings, you have an option to choose the, uh, connection type. We're just gonna leave it as gateway,

00:02:08.150 --> 00:02:18.295
and, uh, we have credential kind. We'll drop down. We'll select static API key. Now in this video, I'm just gonna show you how to do it with LM Studio, or this would also work if you have OLAMA.

00:02:18.455 --> 00:02:20.855
And it also works if you're using OpenRouter.

00:02:20.855 --> 00:02:28.080
So if you wanna follow-up video for OpenRouter, just let me know below. For the gateway base URL, we're gonna get that directly from LM Studio,

00:02:28.320 --> 00:04:02.975
and then we'll come back to the API key and figure out our credential type. So if you haven't heard of LM Studio, I'm gonna drop a video somewhere on screen right now that'll give you a full run through, especially if you're brand new to this tool. But, essentially, it's a free desktop app that you can download onto your computer, and you can browse free local AI models and then download them onto your device. And then you can use them either directly in the app like a chat mode, or you can plug them into different tools like Hermes or OpenClaw or, in our case, into Claude. Now you can download LM Studio for Mac and Windows, so it's gonna run for both operating systems. So once you open up LM Studio, you're gonna see a window like this. Now there's three things we need to do here. First The is we need to download a model so that we can then plug it into Claude. The second is we need to get ourselves this gateway based URL, so that's gonna be in a settings in LM Studio. Then the final thing is when you download a model, it's actually just living in your, like, storage. It's technically asleep on your computer. In order for it to be useful, we need to wake it up and just kind of keep it turned on. So I'm gonna show you how to do that as well. So now the first thing we wanna do is just download one of these local models. So just gonna go to model search. And now this tab, everything on the left hand side, these are all free local models that you can download. Just be mindful as you're browsing the model, if you get a red warning that says likely too large, it just means it's too big to run on your computer and you wanna just find a different model. We're gonna get a green tick like this. So the Gemma e four b and the e two b are fantastic models. They're very small, and they're really good for, like, agentic tasks. Pretty much what we wanna do in co work. And as you can see, got a green tick saying full GPU offload possible. And in that case, I would just download this model.

00:04:03.375 --> 00:04:05.775
The next thing you wanna do is click on to settings.

00:04:06.015 --> 00:04:08.895
Open up the settings panel, go across to developer,

00:04:08.975 --> 00:04:22.760
and then you wanna turn this setting on. So by default, if it's your first time using LM Studio, it's gonna be turned to off. Developer mode will be off. You just wanna flick it across to on. Then you can close this panel, and you should see a new menu bar over here called developer.

00:04:22.840 --> 00:05:02.085
Now when you open up developer, this is the access where we can manage our our model. We're able to get our model loaded up into memory so it's awake. And then by using this URL, this is the gateway URL, we can actually plug this directly into Claude. So as you can see here, I've got a bunch of loaded up models. They're not all of the models that I have on my computer. These are just the ones that are awake and ready to do some work. Now while we're here, I'm just gonna delete this model here. I'm I mean, I'm not gonna delete it. I'm just gonna put it back to sleep. It says Claude Opus 4.6. We'll come back to this. I actually don't have a Claude model on my computer, but that's important for us to know in just a second. I'm gonna copy this URL, and let's paste it into this gateway base URL.

00:05:02.165 --> 00:05:08.645
And now we need an API key. Since I'm doing this locally using my local LM Studio, I wanna put a default value

00:05:08.645 --> 00:05:11.845
of LM dash Studio, leave it as bearer.

00:05:12.200 --> 00:05:14.280
And for now, let's just test the connection.

00:05:14.440 --> 00:05:15.480
So scrolling down,

00:05:15.720 --> 00:05:44.990
the gateway returned no usable models, which is a little bit strange because actually in l m studio, I've got two models that are actually loaded up and they're ready to go. But the one caveat is that the desktop app is actually searching to see what your model alias is or, like, what the actual model name is. In my case, I've got a Gemma and a Minibax, and Claude is only looking for things that have Sona or Opus or Haiku. So in this case, none of the models that we have will have this, um, will have this convention.

00:05:45.230 --> 00:06:00.915
So what you can do to bypass that issue is when you're loading up your model, which means you're taking it from sleeping to awake. I'm just gonna go through some of these models here. I've got my Gemma four twenty six b. When I click this, now I'm in a settings panel to basically wake this up and configure the settings.

00:06:01.155 --> 00:06:06.995
I can get this API identifier. I'm just gonna backspace this. I'm gonna type in Claude Opus

00:06:07.780 --> 00:06:09.140
4.8.

00:06:09.540 --> 00:06:20.740
And for me, I just wanna get my context window to be as big as possible. Once again, watch that instructional video. All this kind of stuff will make sense. Most important part is that you wanna have Claude Opus 4.6,

00:06:20.820 --> 00:06:27.255
and now we can load our model. So as we see, we're gonna be loading our model, and it's got this convention here, 4.8,

00:06:27.335 --> 00:06:28.695
but we just wanna confirm

00:06:29.095 --> 00:06:34.135
it's the Gemma model, but it's gonna be, uh, identified as Opus 4.8.

00:06:34.135 --> 00:06:38.055
So now if we come back to our settings, and let's just test model discovery,

00:06:38.560 --> 00:06:42.560
there we go. One model found. So we just kind of refresh everything.

00:06:42.720 --> 00:06:46.320
We found the model. Everything's fine. We found the Opus 4.8.

00:06:46.560 --> 00:07:14.010
So now before I save these settings and apply anything here, the one final thing that I wanna do is when I'm using the paid API service from Claude, part of the tools part of the built in tools that we get are things like web fetch and web search. So when you're using Claude in a desktop app or on a a web or whatever and you ask it a question to, like, search the Internet for some something or whatever, it's already built in. That web search is built in. But since we're using our local models, we don't have built in web search. We'll have to introduce an MCP,

00:07:14.170 --> 00:07:17.450
basically, like a connection that we can search the web by ourselves.

00:07:17.770 --> 00:07:24.225
So this disable built in tools just means that the model is never gonna call this. It's gonna look for MCP connections,

00:07:24.305 --> 00:07:28.785
uh, once again because our local model doesn't have this. So I'm gonna go to apply changes,

00:07:28.865 --> 00:07:30.065
save, and restart.

00:07:30.145 --> 00:07:53.105
And now if this is your first time using Claude and you didn't have the desktop app open or signed in, you will see this window. But if you are already using Claude and you were already signed in before starting this process, you're not gonna see this window. All you need to do is just sign out. Just open up your Claude, the desktop app, and just sign out of it, and then you'll be able to see the screen. Now we still have two ways to sign in. So the first way is using claw.ai,

00:07:53.105 --> 00:08:21.745
so our paid subscription, which we don't lose that privilege even if we do this third party LLM provider, or we can do what we wanna do here, which is use our local model. So I'm gonna click on continue, and here we go. Let me just drop down. I can see my Opus four. I'm in Cowork right now, but I wanna get across to Claude Code. Now before we actually fire off our agents, I wanna make sure that we have Internet search plugged in. I've already configured BraveSearch MCP, and I'm gonna show you how to do the same thing. So to do this, we're just gonna go into this gateway settings button, click on settings,

00:08:22.065 --> 00:08:23.985
just go across to developer,

00:08:24.385 --> 00:08:26.465
and we're gonna click on edit configuration.

00:08:26.705 --> 00:08:36.130
And then you wanna open the configuration file. And once you open your file, you see a bunch of different settings inside that file. They all relate to your Claude desktop app configuration.

00:08:36.370 --> 00:08:40.530
You will not see this. I have an MCP server plugged in, which is the BraveSearch.

00:08:40.770 --> 00:08:46.210
Now you can actually use whatever provider that you wanna use. Most providers online will have an MCP connection.

00:08:46.755 --> 00:08:56.035
All you need to do is just Google, you know, BraveSearch MCP or FireCrawl MCP, whatever you wanna use. Scroll down until you find the NPX

00:08:56.035 --> 00:09:00.640
install. This is what we need to get the MCP plugged in. You can now copy this,

00:09:00.880 --> 00:09:17.665
then copy everything that is in this configuration file, and just go across to a new Claude session and paste in the MCP settings, paste in the configuration file that you had, and then ask Claude to combine those two together. Once you get it combined, you can take the output and just paste it into configuration file.

00:09:17.905 --> 00:09:53.435
And then as you go through and you wanna find different connectors, like you wanna use a ClickUp MCP or, I don't know, a Gmail MCP, whatever is available, you can then keep coming back into this Claude session, plug in, uh, give the new MCP, and then ask Claude to add it for you. Now just be mindful for the brave search. You are gonna have to have an API key. So in this case, just sign up, create a new account, and then generate a new API key. And then once you're done, you'll be able to see your brave search as an option on your on your connectors. Just make sure that it's turned on. And now the final thing we wanna do is figure out how to create those hundreds of agents to do work for us, and you can do that by using a new feature called dynamic workflows.

00:09:53.740 --> 00:09:57.020
So this was released a few days ago with Opus 4.8.

00:09:57.100 --> 00:10:03.740
A dynamic workflow is a JavaScript that lets you basically deploy hundreds of sub agents.

00:10:04.140 --> 00:10:28.910
Now the specifics around this are you you can have up to 16 concurrent agents. So 16 agents working at one time and a total of 1,000 agents per run. So let's say you have a big project. You have an office. You can have 16 employees working in that office at any one time. Let's say this whole project takes you five hours. Across that five hours, you would have had a thousand people come through doing work at different times.

00:10:29.150 --> 00:10:33.710
So, yeah, at any one time, it's 16, but a total per task is 1,000.

00:10:33.870 --> 00:10:52.570
And then inside Claude code, we have a slash command, which is deep research, and this is already a bundled workflow. So as long as we use this slash command, Claude's already gonna know to basically generate hundreds of agents for the task. So back in Claude, I'm just gonna use the slash command, find deep research, and then paste in the command that I used before.

00:10:52.890 --> 00:11:08.755
I'm basically saying, hey. I wanna start a local AI agency in Australia, find my 10 competitors, find 10, you know, types of customers that are looking for these services, and then build me a business plan around this. Now as you can see here, this is literally real time processing. I'm using my m three ultra,

00:11:08.755 --> 00:11:41.945
uh, with five twelve gigs of RAM, and I'm using the Gemma 26 b. It's a small model. It doesn't have a lot of strain from my MacBook, uh, from my m three Studio. But at a very high level, when using local AI models, there's two main components to be able to get a response. The first is prefill. So, like, how fast can your model intake the prompt that you're sending it? Um, and then you have decoding, which is how fast your model can generate a response. The Mac Studio is pretty fast at generating responses, but it's a little bit slow at ingesting and kind of, like, processing the prompt. Plus since we're using Claude, this is like

00:11:42.665 --> 00:11:45.145
yeah. There there's a lot of tokens that are already prebuilt.

00:11:45.520 --> 00:11:48.560
Basically, from the very first message, we're sending, like, 30,000.

00:11:48.560 --> 00:12:14.575
It's a lot. But then from here, can literally just leave your computer. You can come back in one or two hours, and then you would have had, you know, a couple of 100 agents do a bunch of work for you. Alright, guys. Thank you very much for watching this video. If you enjoyed it, I'd appreciate if you could, uh, like the video, drop a comment, or subscribe to my channel. And if you'd like to see a follow-up of me plugging into OpenRouter so that you can run free cloud models or really, really cheap, uh, paid cloud models, uh, let me know in the comments below. Alright. See you the next one.
