Bart Slodyczka · Youtube · 12:15

Claude Code + LM Studio: FREE Unlimited AI Agents (Don't Pay $200/month)

A 12-minute setup guide for running Claude Code multi-agent Dynamic Workflows on free local models with no Anthropic account required.

Posted

May 31st 2026

2 days ago

Duration

12:15

Format

Tutorial

educational

Channel

BS

Bart Slodyczka

§ 01 · The Hook

The bait, then the rug-pull.

A thousand agents, zero API bill. The Claude Desktop app ships a gateway mode that routes all inference to a local LM Studio server -- and the only non-obvious step is a model-renaming trick that takes thirty seconds.

§ · Chapters

Where the time goes.

00:00 – 00:55

01 · Hook + gateway overview

Introduces Claude Dynamic Workflows, names the two backend options (LM Studio local, OpenRouter cloud), and frames the cost angle.

00:55 – 02:00

02 · Claude docs: Cowork on 3P

Walks through official setup guide. Step 1: download Claude Desktop. Step 2: do not sign in with an Anthropic account.

02:00 – 02:28

03 · Configure third-party inference

In Claude Desktop settings: set connection type to gateway, credential kind to static API key.

02:28 – 04:22

04 · LM Studio overview

Download, model search interface, GPU compatibility green-tick indicator, three setup steps.

04:22 – 05:14

05 · Developer mode + gateway URL

Enable developer mode in LM Studio, open Developer tab, copy local server URL, paste into Claude gateway base URL field.

05:14 – 06:46

06 · The model-alias trick

Claude rejects models not named Sonnet/Opus/Haiku. Fix: rename the model API identifier to claude-opus-4.8 in LM Studio load settings.

06:46 – 07:30

07 · Disable built-in tools + BraveSearch MCP

Local models lack built-in web search. Toggle disables native tools so Claude looks for MCP connections.

07:30 – 08:21

08 · Sign-in flow

If already signed in to Claude, sign out first to see the gateway login screen. Continue without account.

08:21 – 09:53

09 · Adding MCP via config file

Copy NPX install snippet from BraveSearch docs, use Claude to merge it into existing claude_desktop_config.json.

09:53 – 10:52

10 · Dynamic Workflows explained

16 concurrent agents, 1,000 total per run. /deep-research is a bundled slash command. Live business-plan demo.

10:52 – 12:15

11 · Live demo + prefill/decode explainer

Agents running in real time on Gemma 26B. Explains why first message is slow (30K tokens). Suggests leaving the computer for 1-2 hours.

§ · Storyboard

Visual structure at a glance.

open -- agents running

hook open -- agents running 00:00

gateway diagram

promise gateway diagram 00:23

Claude docs

value Claude docs 01:09

configure third-party inference

value configure third-party inference 02:02

LM Studio download

value LM Studio download 02:51

developer tab + gateway URL

value developer tab + gateway URL 04:26

model discovery failure

value model discovery failure 05:16

model discovery success

value model discovery success 06:46

Welcome to Claude -- no account

value Welcome to Claude -- no account 08:23

Claude Code interface active

value Claude Code interface active 09:53

dynamic workflows docs

value dynamic workflows docs 09:53

prefill/decode explainer

value prefill/decode explainer 10:52

CTA + outro

cta CTA + outro 12:14

§ · Frameworks

Named ideas worth stealing.

11:08 concept

Prefill vs. Decode

Prefill (TTFT -- prompt ingestion speed)
Decode (TPS -- token generation speed)

Two phases of local LLM inference. Prefill determines time to first token; decode determines generation speed. Local hardware bottlenecks at prefill for large contexts.

Steal for explaining why local model latency feels front-loaded on long prompts

00:55 concept

Cowork on 3P

Anthropic term for running Claude Desktop against a third-party inference provider, enabling any OpenAI-compatible endpoint as a drop-in replacement.

Steal for framing self-hosted AI setups as officially supported, not hacks

§ · Quotables

Lines you could clip.

00:09

"Instead of us using the paid API from Anthropic or even needing an Anthropic account, I'm gonna show you how to do this by using local AI models that are running completely on your computer."

lead-off cost hook, complete standalone thought → TikTok hook

05:14

"The gateway returned no usable models -- Claude is only looking for things that have Sona or Opus or Haiku."

names the blocker that stops most people, high search value → IG reel cold open

10:03

"A dynamic workflow is a JavaScript that lets you basically deploy hundreds of sub agents."

clean one-liner definition, no setup needed → newsletter pull-quote

11:48

"From the very first message, we're sending like 30,000 tokens. It's a lot. But then from here, you can literally just leave your computer."

honest about the limitation, ends on the payoff → TikTok hook

§ · Resources Mentioned

Things they pointed at.

00:55linkClaude Docs: Cowork on 3P ↗

02:28toolLM Studio ↗

03:38toolOpenRouter ↗

06:55toolBraveSearch MCP ↗

08:45toolFireCrawl MCP ↗

§ · CTA Breakdown

How they asked for the click.

12:14 subscribe

"If you enjoyed it, I would appreciate if you could like the video, drop a comment, or subscribe to my channel."

Standard subscribe ask after content wraps. Also seeds a follow-up video on OpenRouter integration.

§ 04 · The Script

Word for word.

HOOK opening / re-engagementCTA the pitch analogy

00:00HOOKHello, legends. In this video, I'm gonna show you how to use the new Claude dynamic workflows feature, which lets you generate up to a thousand agents to work on really complicated tasks. And instead of us using the paid API from Anthropic or even needing an Anthropic account, I'm gonna show you how to do this by using local AI models that are running completely on your computer.

00:19HOOKAnd this is possible because we're using the gateway version for Claude. The gateway version is still an official Claude product. It's literally the Claude desktop app, which we get access to the Claude code and the Claude co work.

00:31HOOKBut by using the gateway, we're able to plug into any LLM. So we can either use something like LM Studio, which we're gonna be doing in this video, to download and use local models directly with Claude Cowork and Claude Code, or we can connect up to something like OpenRouter, which has got access to hundreds of cloud based models.

00:50HOOKSome are free. Some are paid. But even the paid ones, you will save, like, nine just over 98%

00:55to get really, really good models that you can use. To get this working, what we need to do is just read the documentation on a thing called Cowork on three p. Once again, that's just the version of Claw desktop app that lets you plug into a gateway.

01:07So we're just gonna go across to this documentation. So over here, we can see run Cowork against your own cloud inference provider or, in our case, our own local inference provider. And I'm just gonna go into the next steps to figure out how to install and set this up.

01:19So our first step is to download the Claw desktop app. If you don't already have this, just click this button, and then download the desktop app for yourself. Works on a Mac and Windows, so just download and install.

01:29Once you're done with that, the step two is, uh, explicitly stated, do not sign in or do not create an Anthropic account because once again, you don't need to have an account or to be using the Cloud API to make this work. And once your app is open on your screen, you just go into the top left hand corner if you're on Mac OS and click on help, drop down to troubleshooting,

01:48and then, uh, enable developer mode. Once you enable developer mode, in that same top menu bar, you see a new menu button called developer. Once you drop that down, you'll see configure third party inference.

02:00When you open the configure third party inference settings, you have an option to choose the, uh, connection type. We're just gonna leave it as gateway, and, uh, we have credential kind.

02:09We'll drop down. We'll select static API key. Now in this video, I'm just gonna show you how to do it with LM Studio, or this would also work if you have OLAMA.

02:18And it also works if you're using OpenRouter. So if you wanna follow-up video for OpenRouter, just let me know below. For the gateway base URL, we're gonna get that directly from LM Studio,

02:28and then we'll come back to the API key and figure out our credential type. So if you haven't heard of LM Studio, I'm gonna drop a video somewhere on screen right now that'll give you a full run through, especially if you're brand new to this tool. But, essentially, it's a free desktop app that you can download onto your computer, and you can browse free local AI models and then download them onto your device.

02:47And then you can use them either directly in the app like a chat mode, or you can plug them into different tools like Hermes or OpenClaw or, in our case, into Claude. Now you can download LM Studio for Mac and Windows, so it's gonna run for both operating systems. So once you open up LM Studio, you're gonna see a window like this.

03:04Now there's three things we need to do here. First The is we need to download a model so that we can then plug it into Claude. The second is we need to get ourselves this gateway based URL, so that's gonna be in a settings in LM Studio.

03:16Then the final thing is when you download a model, it's actually just living in your, like, storage. It's technically asleep on your computer. In order for it to be useful, we need to wake it up and just kind of keep it turned on.

03:26So I'm gonna show you how to do that as well. So now the first thing we wanna do is just download one of these local models. So just gonna go to model search.

03:32And now this tab, everything on the left hand side, these are all free local models that you can download. Just be mindful as you're browsing the model, if you get a red warning that says likely too large, it just means it's too big to run on your computer and you wanna just find a different model. We're gonna get a green tick like this.

03:48So the Gemma e four b and the e two b are fantastic models. They're very small, and they're really good for, like, agentic tasks. Pretty much what we wanna do in co work.

03:56And as you can see, got a green tick saying full GPU offload possible. And in that case, I would just download this model. The next thing you wanna do is click on to settings.

04:06Open up the settings panel, go across to developer, and then you wanna turn this setting on. So by default, if it's your first time using LM Studio, it's gonna be turned to off.

04:14Developer mode will be off. You just wanna flick it across to on. Then you can close this panel, and you should see a new menu bar over here called developer.

04:22Now when you open up developer, this is the access where we can manage our our model. We're able to get our model loaded up into memory so it's awake. And then by using this URL, this is the gateway URL, we can actually plug this directly into Claude.

04:37So as you can see here, I've got a bunch of loaded up models. They're not all of the models that I have on my computer. These are just the ones that are awake and ready to do some work.

04:45Now while we're here, I'm just gonna delete this model here. I'm I mean, I'm not gonna delete it. I'm just gonna put it back to sleep.

04:51It says Claude Opus 4.6. We'll come back to this. I actually don't have a Claude model on my computer, but that's important for us to know in just a second.

04:57I'm gonna copy this URL, and let's paste it into this gateway base URL. And now we need an API key. Since I'm doing this locally using my local LM Studio, I wanna put a default value

05:08of LM dash Studio, leave it as bearer. And for now, let's just test the connection. So scrolling down,

05:15the gateway returned no usable models, which is a little bit strange because actually in l m studio, I've got two models that are actually loaded up and they're ready to go. But the one caveat is that the desktop app is actually searching to see what your model alias is or, like, what the actual model name is. In my case, I've got a Gemma and a Minibax, and Claude is only looking for things that have Sona or Opus or Haiku.

05:40So in this case, none of the models that we have will have this, um, will have this convention. So what you can do to bypass that issue is when you're loading up your model, which means you're taking it from sleeping to awake. I'm just gonna go through some of these models here.

05:53I've got my Gemma four twenty six b. When I click this, now I'm in a settings panel to basically wake this up and configure the settings. I can get this API identifier.

06:03I'm just gonna backspace this. I'm gonna type in Claude Opus 4.8.

06:09And for me, I just wanna get my context window to be as big as possible. Once again, watch that instructional video. All this kind of stuff will make sense.

06:16Most important part is that you wanna have Claude Opus 4.6, and now we can load our model. So as we see, we're gonna be loading our model, and it's got this convention here, 4.8,

06:27but we just wanna confirm it's the Gemma model, but it's gonna be, uh, identified as Opus 4.8. So now if we come back to our settings, and let's just test model discovery,

06:38there we go. One model found. So we just kind of refresh everything.

06:42We found the model. Everything's fine. We found the Opus 4.8.

06:46So now before I save these settings and apply anything here, the one final thing that I wanna do is when I'm using the paid API service from Claude, part of the tools part of the built in tools that we get are things like web fetch and web search. So when you're using Claude in a desktop app or on a a web or whatever and you ask it a question to, like, search the Internet for some something or whatever, it's already built in.

07:08That web search is built in. But since we're using our local models, we don't have built in web search. We'll have to introduce an MCP,

07:14basically, like a connection that we can search the web by ourselves. So this disable built in tools just means that the model is never gonna call this. It's gonna look for MCP connections,

07:24uh, once again because our local model doesn't have this. So I'm gonna go to apply changes, save, and restart.

07:30And now if this is your first time using Claude and you didn't have the desktop app open or signed in, you will see this window. But if you are already using Claude and you were already signed in before starting this process, you're not gonna see this window. All you need to do is just sign out.

07:43Just open up your Claude, the desktop app, and just sign out of it, and then you'll be able to see the screen. Now we still have two ways to sign in. So the first way is using claw.ai,

07:53so our paid subscription, which we don't lose that privilege even if we do this third party LLM provider, or we can do what we wanna do here, which is use our local model. So I'm gonna click on continue, and here we go. Let me just drop down.

08:04I can see my Opus four. I'm in Cowork right now, but I wanna get across to Claude Code. Now before we actually fire off our agents, I wanna make sure that we have Internet search plugged in.

08:13I've already configured BraveSearch MCP, and I'm gonna show you how to do the same thing. So to do this, we're just gonna go into this gateway settings button, click on settings, just go across to developer,

08:24and we're gonna click on edit configuration. And then you wanna open the configuration file. And once you open your file, you see a bunch of different settings inside that file.

08:33They all relate to your Claude desktop app configuration. You will not see this. I have an MCP server plugged in, which is the BraveSearch.

08:40Now you can actually use whatever provider that you wanna use. Most providers online will have an MCP connection. All you need to do is just Google, you know, BraveSearch MCP or FireCrawl MCP, whatever you wanna use.

08:53Scroll down until you find the NPX install. This is what we need to get the MCP plugged in.

08:59You can now copy this, then copy everything that is in this configuration file, and just go across to a new Claude session and paste in the MCP settings, paste in the configuration file that you had, and then ask Claude to combine those two together. Once you get it combined, you can take the output and just paste it into configuration file.

09:17And then as you go through and you wanna find different connectors, like you wanna use a ClickUp MCP or, I don't know, a Gmail MCP, whatever is available, you can then keep coming back into this Claude session, plug in, uh, give the new MCP, and then ask Claude to add it for you. Now just be mindful for the brave search.

09:32You are gonna have to have an API key. So in this case, just sign up, create a new account, and then generate a new API key. And then once you're done, you'll be able to see your brave search as an option on your on your connectors.

09:44Just make sure that it's turned on. And now the final thing we wanna do is figure out how to create those hundreds of agents to do work for us, and you can do that by using a new feature called dynamic workflows. So this was released a few days ago with Opus 4.8.

09:57A dynamic workflow is a JavaScript that lets you basically deploy hundreds of sub agents. Now the specifics around this are you you can have up to 16 concurrent agents. So 16 agents working at one time and a total of 1,000 agents per run.

10:14So let's say you have a big project. You have an office. You can have 16 employees working in that office at any one time.

10:20Let's say this whole project takes you five hours. Across that five hours, you would have had a thousand people come through doing work at different times. So, yeah, at any one time, it's 16, but a total per task is 1,000.

10:33And then inside Claude code, we have a slash command, which is deep research, and this is already a bundled workflow. So as long as we use this slash command, Claude's already gonna know to basically generate hundreds of agents for the task. So back in Claude, I'm just gonna use the slash command, find deep research, and then paste in the command that I used before.

10:52I'm basically saying, hey. I wanna start a local AI agency in Australia, find my 10 competitors, find 10, you know, types of customers that are looking for these services, and then build me a business plan around this. Now as you can see here, this is literally real time processing.

11:06I'm using my m three ultra, uh, with five twelve gigs of RAM, and I'm using the Gemma 26 b. It's a small model.

11:13It doesn't have a lot of strain from my MacBook, uh, from my m three Studio. But at a very high level, when using local AI models, there's two main components to be able to get a response. The first is prefill.

11:24So, like, how fast can your model intake the prompt that you're sending it? Um, and then you have decoding, which is how fast your model can generate a response. The Mac Studio is pretty fast at generating responses, but it's a little bit slow at ingesting and kind of, like, processing the prompt.

11:39Plus since we're using Claude, this is like yeah. There there's a lot of tokens that are already prebuilt.

11:45Basically, from the very first message, we're sending, like, 30,000. It's a lot. But then from here, can literally just leave your computer.

11:51You can come back in one or two hours, and then you would have had, you know, a couple of 100 agents do a bunch of work for you. Alright, guys. Thank you very much for watching this video.

11:58If you enjoyed it, I'd appreciate if you could, uh, like the video, drop a comment, or subscribe to my channel. And if you'd like to see a follow-up of me plugging into OpenRouter so that you can run free cloud models or really, really cheap, uh, paid cloud models, uh, let me know in the comments below. Alright.

12:13CTASee you the next one.

— full transcript

§ 05 · For Joe

How to run Claude agents without an Anthropic bill.

WHAT TO LEARN

Claude Desktop gateway mode is an official feature that lets you substitute any local model for the Anthropic API, and one naming convention is the only non-obvious requirement.

Claude Desktop model-discovery filter only accepts model IDs containing Sonnet, Opus, or Haiku -- renaming your local model to match this pattern before loading it is the single step most tutorials skip.
Dynamic Workflows are capped at 16 concurrent agents and 1,000 total per run, which is more than enough for most research, analysis, and code-generation tasks running locally.
The /deep-research slash command is already bundled into Claude Code -- there is no scripting required to access multi-agent behavior, just type the command.
Local model latency is front-loaded: the first response in a Claude Code session is the slowest because the system context runs around 30,000 tokens, and prefill speed is where local hardware trails hosted inference most noticeably.
BraveSearch and other web-search providers require their own API keys even when connected through an MCP -- the MCP provides the interface but not the credential.
OpenRouter is a drop-in alternative to LM Studio in this same setup and gives access to hundreds of cloud models including free tiers and paid options at a significant discount over direct API pricing.

§ 06 · Frame Gallery

Visual moments.

01:27

08:02