WEBVTT

00:00:00.000 --> 00:00:06.880
So researchers reverse engineered Cloud Code's entire source and found something that should make every developer stop and think. Only about 1.6%

00:00:06.880 --> 00:00:10.320
of the code base is actually AI decision logic. 1.6%.

00:00:10.320 --> 00:00:19.135
The other 98% is infrastructure, context pipelines, memory systems, permission layers, safety scaffolding. So here's the question. If the intelligence is 1.6%

00:00:19.135 --> 00:00:41.320
of the equation, why are we paying cloud subscription prices? That's like a 100%. And here's the part nobody's talking about. Cloud models get quietly nerfed over time, and you adapt to those changes without even realizing it. Today, I wanna tell you about a new release of something we've released here at Starter Pack. Make sure you stay to the end because we're gonna give you two free things in this, and I mean free as in free as in free. And I'm gonna show you what we built to prove this. Let's dive into it today.

00:00:47.855 --> 00:01:53.430
Welcome to Starter Pack. I'm Spencer, and here at Starter Pack, we love to build custom software solutions for companies. With a decade of executive leadership as a fractional CTO on twenty five years in software development, I helped transform tech teams and products, including building out custom AI solutions. Now look, the AI tooling market wants you to believe that renting access to a model like Cloud Code or Codecs is just the cost of doing business now, but it's not. The gap between Cloud AI and Local AI is way closer than you think. I've got a tool here today that's gonna help you, and it's absolutely free. I'm not selling you anything. Before we get into it, one of the biggest things you can do is drop a comment, and as always, make sure you follow what I'm gonna give you here. So researchers who reverse engineered Cloud Code found that the actual AI decision logic is around 2% of it. The remaining 98% is context pipelines and a lot of other pieces. This matters because it reframes the entire conversation. The gap between Cloud and Local AI is not a magic gap. It's an engineering gap, and we've helped close that gap with a free service we're gonna give you guys today. Engineering gaps close when developers decide to close them. Most people try to local model once, compared it to one of the others and said this wasn't quite ready, but what they were missing was they were actually just missing the harness. The models are are moving fast, and this is Quinn.

00:01:53.670 --> 00:02:23.110
This is DeepSeek. There's a lot of the other models, and there's a lot you can do. Gemma four was released. And here's something that doesn't get talked about enough. The cloud model behavior shifts over time, and most users absorb those changes. Guardrails tightened. Output quality drifts. We've seen clawed code dramatically drop over the last few months. Prompts that worked reliably last quarter start producing different results this quarter. Now you adapt and you rewrite your prompts and you try to modify things with skills and you do this thinking you can beat it, But the biggest problem here is rate limits are absolutely stifling innovation. Price restructures,

00:02:23.110 --> 00:03:01.905
capability rollbacks. When a vendor controls your model, they control your whole workflow. You've gotta take that control back. That's why we launched Open Mono Agent. It's an AI that you don't have to meter here. Right? Unlimited tokens forever. Now you're saying, what are you selling me, Spencer? Hear me out here because this is your machine, your agent, and you use it from anywhere. Now it's a kicker. So I know a lot of times you've probably tried to set up a local model in the past and thought, man, is way too complex, way too difficult. Look, guys, it's one copy. You copy this, paste it in. It's gonna give you three options. You can either run the whole stack on your machine, which I'm gonna show you here in a minute, or you can install the inference on one machine. Now you think, oh, I've gotta have a really expensive h 100 that's $30,000.

00:03:01.985 --> 00:04:19.330
Absolutely not. We are doing this on standard hardware. See this stuff back here behind me? These are normal gaming machines. Very low end by today's standards in a lot of cases. This one right behind me just has a thirty ninety in it, guys. This is about a thousand dollar gaming machine. You can find these on your local Craigslist, Facebook Marketplace, like wherever you go pick up your stuff. But even more importantly, we've built them out on these little nook and bricks here that actually work, then give you about 20 tokens per second. That's very comparable to what you get with Claude Code or with Codex. So for about 20 tokens per second, you can own the whole hardware, and this is very reasonably priced. And I'm gonna show you something here at the end that you're definitely gonna wanna make sure you stay at the end. Now how do you get started on this? It's really easy to start. Copy this, paste it, run it. I wanna go through some of the features here with you because one of our manifesto here is that AI shouldn't be a subscription that you rent. It should be infrastructure that you own sitting on your desk, serving your code, answering only to you. Now this is local first always. That means you own everything. The model run, everything to the top, to the bottom of the stack. Nothing goes across the cloud. Unlimited tokens. You want this thing to run for four days? All you. Like I said, these things here run on about 25 watts. K? The other thing is we've built this so that they're sandboxed by default. You get a Docker native so your agent mounts your project in, and it doesn't escape. Permission gates are right inside that Docker, and it's fully 100% open source. Don't believe me? Well, here you go. Here's the whole project,

00:04:19.570 --> 00:04:56.195
open source, right here on GitHub, all for you. We have a massive amount of documentation that we've worked on, and the whole project is ready to go. This is not just proof of concept, guys. I'm gonna show you a working demo here in a minute. But this is the full thing. You can go in and read the documents. Each of these go into the different parts. And I'm gonna go through some of those parts with you here, wanna but talk through some of these. So first of all, it's embedded inference zero setup. You literally run the script. If you decide to run it on two machines with the inference and your agent, then you can run the agent on your dev laptop and run your inference on the machine back home, and we connect them with a relay server. It's trivially simple. TUI is our interface that we use, and so it's built for long sessions, you can continue to run it indefinitely.

00:04:56.275 --> 00:05:04.995
It's Docker sandboxed. We have over 20 different MCP tools built in. It's built for .net, focused on dot net. So we actually built it with csharp.net,

00:05:04.995 --> 00:05:10.020
and you'll see all the code, and it is blazing fast. LSP for c sharp and TypeScript.

00:05:10.020 --> 00:05:23.545
Playbooks, this is our version of skills. Playbooks is dominant over skills. There's so much more you can do with a playbook. These are typed composable stateful workflow automation, step sequencing, gates, and templates, not just markdown recipes.

00:05:23.545 --> 00:06:44.580
This is not just one flat text file, folks, and it's very easy. The agent itself will actually help you write these. Now we also have our dual box mode that I talked about here, and one of the best parts is we're hosting a free Relay server where you can actually go and sign up and set up your Relay between the two boxes. Absolutely free, totally encrypted, 100% secure. We're not getting any data from you. We'll get your email so that that's your ID. Other than that, like and we're not doing anything those. Next is you have persistent sessions. We actually are saving your sessions in JSON. They stay on your machine. We're not saving them. I'm sorry. They're saving on your machine on the agent machine. This thing runs we've probably installed it about a thousand times, and I'm not exaggerating here. You wanna see some of these servers behind me? These are about half of the dev servers that we have. See the ones underneath the desk over here, the ones over. We have about 20 to 30 different type of workstations varying from boxes these size up to 50 nineties. We've not done anything larger than fifty ninety on this. And with that, we have this incredible set here. So you can do these little bricks, right, which are Ryzen nine seventy nine forty HS, get about 20 tokens per second. Thirty ninety is about 50. The forty ninety is we got a little typo here we need to get fixed up. It's about 60 tokens per second, and the 50 nineties are running closer to a 100 tokens per second. We actually have tested with five different developers all running against one fifty ninety at the same time using the dual box setup. So you can go through and compare how it stacks up against other things, but really, you're truly up and running in one single command, like two commands because you're install, and then you're up and running with MonoAgent.

00:06:45.015 --> 00:07:09.040
This is yours, guys. It's open. It's local. It's yours. It's forever. Incredibly fast. Right? We have absolutely worked to optimize this. 100 open source here, folks. Biggest favor that I ask is you just leave us a star on this, because as you know, one of the best ways you can get it is to help us to get the stars in there. Now we can go through more of these features, and I wanna dive into a couple more of these features with you here before I go on a little bit more. So with this, uh, I wanna talk a little bit about playbooks. Your agent needs guarantees,

00:07:09.040 --> 00:07:39.200
not just suggestions. So a skill is like a suggestion. It's like, hey, if you kinda wanna do this thing, go over here and act like this. A skill actually tells it exactly how to run. It's not just a prompt. A skill is a prompt. The model can drift, skip, or misinterpret. A playbook gate is code. The executor calls this, and the LM is not in the loop. It cannot skip it, hallucinate past it, or decide it knows better. Now I'm telling you guys, this is way better. We have tons of documentation around this. We have really worked hard to make this work really, really well, and we're really incredibly proud of this. So if you're doing something with OpenClaw,

00:07:39.280 --> 00:08:01.470
this is gonna run circles around that. Now, last but not least here, one of the things that I've been talking about. Go and make sure you sign up. We're doing a free giveaway. Sign up here because we're gonna give away one of these Ryzen boxes where you can run your own inference box at less than, like, I think they're about 25 watts. So it's pretty incredible. These are amazing. You know, believe me, I'm not trying to sell these. This is just a link to Amazon.

00:08:01.630 --> 00:08:34.800
Right? You can go get one of these boxes yourself. But you can see that this is a great opportunity, and we are giving it away. I just want my goal here, as the manifesto states, because you can see the manifesto. My manifesto is that it shouldn't be a subscription, and this is what we're trying to do. I wanna give the opportunity for people to be able to set up and learn how to use AI locally. See, I have a lot of beefs with the big frontier models, and at this point, we all do. We have a lot of beefs with them. We have a lot of complaints. Open MonoAgent is a terminal native coding agent that runs entirely on your machine powered by local LLMs at zero cost. It's written in csharp.net,

00:08:34.800 --> 00:09:12.815
which was a a deliberate choice, not a limitation, because AI tooling should be built like infrastructure, not like a weekend side project. It installs with a single command, and guess what? It runs on every platform. The agents will run on Mac OS, Linux, and on Windows. Right? So you can install the agent, run it locally, and then run the inference on some other box like one of these and run it and at a very low affordable cost, and you own the whole stack. Did I mention that you own the whole stack? It installs with a single command, and it's model agnostic. You can change out the model, but we have the models already all tweaked for you. So if you use the CPU, we have a very specific 3.6 model. We have another QUEN 3.6 model. These have been tested and work

00:09:13.135 --> 00:09:21.700
fantastically. There's no telemetry, no tracking, no free tier, like, the whole thing free. I want you guys to be able to use this because go to monoagent.ai

00:09:21.700 --> 00:10:14.190
right now and install the command. It's right there on the landing page, and it could be running in minutes. Now every prompt you send to cloud AI tool leaves your machine, and that's not paranoia. That's just how it works. Right? For personal projects, that's probably fine. But for client code, proprietary algorithms, anything under an NDA, that's real exposure. Open Mono Agent has no server to exfiltrate your data to because everything runs locally on your hardware. There's no, well, we may use interactions to improve our models. There's no terms and services buried there. You go pull down the code, you can see what we're doing. You can even modify it. Do what you want with it. Help us improve it. Do a pull request. Right? We're gonna continue to build on it. We have a lot of huge plans. Next week, we're rolling out our mobile apps, which will allow the mobile phone to then be able to be in control of it. We then are also rolling out a Versus Code extension that's gonna continue to improve upon this. So when your AI stack is local, the compliance conversations simplifies dramatically because there's nothing leaving the building. Now we built OpenMonoAgent

00:10:14.190 --> 00:10:15.470
in csharp.net.

00:10:15.470 --> 00:10:33.045
This was by choice. .Net is cross platform, it's production grade, and has one of the most mature ecosystems for system level tooling in software development. After a long time in the industry, I've watched developers use NPM packages, use Python, and I've seen just a lot of soiled projects. C sharp gives us type safe long term maintainability

00:10:33.045 --> 00:10:39.770
and performance characteristics that matters when you're building something that's meant to run for a long time. Python's fantastic if you're doing experimentations,

00:10:39.770 --> 00:10:45.210
but C sharp is what you reach for when you need a real production thing to stand up. Now if you wanna contribute,

00:10:45.450 --> 00:11:25.015
fork it, extend it, open a pull request, help us build it out. We're continuing to add more things to it. You can even go see the road map in there, and I have a team that's dedicated to this. I have multiple senior developers. I have a full time PhD AI engineer. I have multiple other junior developers. I have a large team that's working to continue to build this faster. Why? Because we love to build custom software solutions for people, especially stuff that's built that you own. The fastest way to kill an open source project is to make it a three day configuration exercise before you get into anything in front of it. Open Mono Agent installs with a single command, not because it's simple under the hood, but because we engineered the setup to be invisible. This took a lot of time. Developer time is expensive, but if the tool costs more than the configuration

00:11:25.095 --> 00:11:34.470
and saves you in the first week, you've already lost in the argument. So we made onboarding a first class concern because that's where most developer tools go to die. So get it running, connect your local LLM,

00:11:34.550 --> 00:11:36.470
and go to openmonoagent.ai

00:11:36.470 --> 00:12:50.955
because the command's right there on the front page. Now every piece of foundational infrastructure in modern software starts as something that somebody gives away. Linux was dismissed by enterprise vendors as toy that would never handle real workload. Now you know how that ended. Right? Git replaced a version in ClearCase, not because it had a better sales team, because developers adopted it and was genuinely better. The pattern repeats across every technology generation. Incumbents call it a toy, developers use it anyways, and eventually becomes the default. The companies charging you for AI coding tools today are going to call local agents a toy, and they're gonna be wrong. They're betting on a pattern one more time, and I'm gonna take that bet every single time. Now, OpenModel Agent doesn't care which LLM you run, but we've picked out some really good ones that we've tuned for specific setups. Vendor lock in on AI models is not just a version of a problem the software industry has been solving for decades with varying success. When a better model ships next month, and one will, you swap it out with one rebuild. You don't even rebuild. You just literally swap the model out. The inference then continues to run on the new model, and you're off and running. The agent is the layer. The model is the engine. Changing engine should not require you to buy a new car. So that flexibility compounds over time in a way that single model benchmark scores can't match. So this is what democratizing AI really looks like. And it shows up in a lot of marketing copy in other places, but real democratization

00:12:50.955 --> 00:13:38.965
means a developer in Nairobi has the same AI coding skill developers that they have at Google. No credit card required. It means students building their first serious project has access to the same category of tools as a funded startup. It means developers in countries with weaker purchasing power aren't priced out of the tools that they need to be able to compete in the market. Open Mono Agent is free because free is the only price that actually is universal. There's no purchasing power here. So that's a thesis. Not free trial, not free tier, free, permanently, because the mission requires it. Now we may work into some larger things in the future, but mostly this is an opportunity for us to be able to work with folks to be able to show and demonstrate our understanding of how AI works. So Open Mono Agent isn't just free to use. It's free to study, modify, fork, redistribute. And again, I'm putting my money so much where my mouth is that I'm even going to give one away for free.

00:13:38.965 --> 00:14:42.755
Right? We're doing this. We're gonna announce it on May 15. So go and get signed up because this is free as in free as in free. Now if you're a dot net developer or a C sharp practitioner or someone who wants to help us better the local AI tooling, dive in and do a pull request. We welcome it. But the biggest thing I can ask is star the repo. Open an issue. Tell us what's broken, what you wish it would do. The project's gonna grow as fast as the community decides that it should, And that's always the best open source project out there. And I'm willing to commit some resources to this. The companies charging you for AI coding tools are going to call Open Mono Agent a toy. And I'm just gonna plan on that. But again, remember Linux was a toy. Git was a toy. The entire foundation was a toy. These tools got called toys by incumbents who had nothing, who had something to lose from developers owning their own stack. Does that sound familiar? Now, I wanna give you guys just a little quick demo. This is gonna be super fast because we've already gotten really long on this video, but I wanna be able to show you guys how well this works here, okay? So I'm firing this up here. So we can see here that I've got a local project, and this is actually a little small snake game project that that the tool actually wrote itself, but I'm not gonna write that one for you here now. So all we do is type open

00:14:42.755 --> 00:14:44.250
mono agent.

00:14:44.410 --> 00:14:54.650
Okay? So kinda like writing Claude. Boom. There we go. Coder review found, but no tool graph. So we'll talk about Coder review graph. This is a great powerful tool. We'll talk about that another time. So let's say, review the

00:14:54.810 --> 00:14:55.610
project.

00:14:55.770 --> 00:14:57.805
Give me feedback

00:14:58.125 --> 00:15:01.565
on what we need to improve.

00:15:01.725 --> 00:15:13.110
Okay, so firing this up, you can see it's already firing through tokens. Looks like we're burning about 41 tokens per second. This is on this machine that was behind me here, so if you look back behind me, this is on this machine here that's running the 3,090.

00:15:13.190 --> 00:15:30.595
Alright, now let's check the build state. So again, I'm running this and so I'm gonna say yes, I want this to, you know, to run. Okay, we're gonna tell it to give it access. Okay, so it's a minimal ASP dot core static file server, give this kind of the outline. It says, hey, this should be you know, so it's telling me to do some Canva wrappers.

00:15:30.675 --> 00:15:32.515
K? Game state uses all

00:15:32.915 --> 00:16:02.525
let's, you know, let's do global, so it's gonna give us some suggestions there. Oh, hey, there's no gitignore bin, right? So we've got some problems here, right? So it's telling us it's already doing the you know, going to do some modifications on this. So it's saying, for code quality, game over doesn't return anything, but it's called return game over. Right? So it's giving us examples. So I can say please now, because you know Sam told us not to say please because, like, that burns out the tokens. Guess what? We don't care about tokens. I already burned 38 tokens, 38,000 tokens, nobody cares. Please fix number one and create a git repo

00:16:02.845 --> 00:16:04.605
for this project.

00:16:04.845 --> 00:16:13.450
Okay. Let's let it go to town here. It's probably gonna ask me to do a couple it's probably gonna, you know, ask me for some prompts, because things like creating the git, was probably, see, it's gonna take a file permissions.

00:16:13.850 --> 00:16:31.755
So we can also set these two. My head's in the way, but down here in the other corner. In fact, let me get my head out of the way. We already got a prompt here. You can actually change the different modes on it, just like you would expect from any of these models. So we can see that we've, you know, it's prompting us for some of these, but we have some of our different slash commands that we can change out. It's still asking for various different permissions,

00:16:32.235 --> 00:17:02.715
and I just clicked off of one of them. So let's say yes. Oh, let's do a instead. I keep saying yes. So the git ignore isn't taking effect because this, right, and so no commit yet, so git reset head doesn't work. Let's use this. So you can see it's working through this. This would be what you would expect from like a cloud code, right? And this is an example of, you know, what you would expect from that. So there it goes. There's our Git repo, right, initiated the Git branch, did all this, committed it, boom. And all of this is running inside. So let's Control C out of this. Hey look, we have our Git, right? Generated

00:17:02.715 --> 00:17:17.980
the Git, added all this. So you can see that all of this is that it's working great. So I can go through and I can demo this for you a long time, but really the biggest thing is pull this down, try it out, sign up for the free giveaway. Go for this. This is one of the big things we're very excited about, openmonoagent.ai.

00:17:17.980 --> 00:17:45.460
Right? This is a great opportunity for you to be able to dive in, learn how AI works, look at things under the cover, and make sure you are running your own local AI instead of giving away all of your data to these large data providers regardless of what their terms and services say. So go check it out. Do us a big favor. Leave a star there. And as always, make sure you like and subscribe. I'm gonna be teaching about this over the next week and teaching about some of the different features of it. So make sure you follow along because we're gonna be building a lot of this. And as always, if we can help you with custom software solutions, go check out starterpack.com.

00:17:45.460 --> 00:17:46.740
And otherwise, we will catch you tomorrow.