WEBVTT

00:00:00.000 --> 00:00:47.115
Hey. This is the definitive Claude Code course for advanced users. I use Claude Code and AI agents in my own business every day to generate over $4,000,000 a year in profit. I also teach around 2,000 people how to use Claude Code and other tools to improve their lives, both personally and business. Okay. So this course is gonna assume a foundation of Claude code experience. It's not for total beginners, but if you are a total beginner and you happen to stumble on this course, that's okay. Just look over my left shoulder here, click that button, and then I have a four hour guide that will walk you through everything you need to get to the point where you understand what I'm about to say. Assuming you're still here, no fluff, here's what we're gonna cover. We'll start with an advanced look at Claude dot n d's and system prompts, and learn how to optimize these to actually improve quality, which is simpler than you think. We'll then cover agent harnesses and how to build larger projects with Claude code. After that, we'll chat agent teams and other examples of extreme task parallelization.

00:00:47.115 --> 00:01:15.565
Then we'll do skills, sub agents, and other forms of organization. After that, I'll cover Karpathy's auto research approach for improving stuff progressively over time, and a few actual use cases you can apply this to, not just fancy demos. We'll then talk browser automation. The major players will do computer use, browser use, and which tools to apply to different use cases depending on what you want. I'll then cover how to deal with performance fluctuations in Claude code because they do happen, as well as some alternatives that you guys could use and ways to bundle in multi agent orchestration into your workflow. We'll then cover workspace organization.

00:01:15.565 --> 00:01:38.380
So for personal, business, and then even client projects, assuming you're selling this sort of thing as a service. Security for larger projects, we'll chat stuff like the recent auto mode. We'll talk a little bit about OAuth. And at the end, I'll finally round it out with a discussion about where I think Claude code is going and the future of work more generally. Hopefully, you're as excited as I am to level up your Claude code skills. Please use the bookmarks and chapter headings as needed to jump around the course. Subscribe to the channel and let's get into it.

00:01:39.735 --> 00:01:53.850
So for most of the course, I'm gonna be building directly using the Claude code extension inside of anti gravity. That's this over here. If you don't have anti gravity installed, this is an installation tutorial, but get that from Google's official antigravity.google

00:01:53.850 --> 00:01:56.010
website. Then head over to extensions,

00:01:56.170 --> 00:02:51.380
click on clog code for Versus code, give that an install, and then everywhere you go, you'll have this little clog logo that you can use to spin things up. After a brief login, you'll have more or less the exact same layout that I do. I want you to know though that the Clod desktop app is also getting better and better by the day. And because Clod is attempting to get you, obviously, on their infrastructure as opposed to on your own, They're just continuously adding new cool features that allow you to do things like mobile development and so on and so forth. So everything I'm gonna show you today works in both the Claude code tab of the Claude desktop app, also works natively inside of Claude codes extension with an anti gravity or some other, you know, IDE like thing. So if you're intimidated at all by the way that I've laid things out, what all these different folders mean and how they collaborate in order to improve your workflow, I'm gonna cover all that in this course. First though, we're gonna cover Claude. Md and other advanced system prompts. Basically, how to set up your system prompts in a very efficient and effective way, both to save you financially,

00:02:51.380 --> 00:03:02.965
but also to improve the quality of your outputs and significantly minimize the amount of time it takes to build anything. So what is a claude.md really? Well, as far as I could tell, it's four things. The first is, it's a form of knowledge compression.

00:03:02.965 --> 00:03:06.005
Okay? And when I say knowledge compression, what I mean is,

00:03:06.325 --> 00:03:10.565
instead of Claude having to read through your entire workspace,

00:03:10.805 --> 00:03:16.550
you know, file by file, like for instance over here, instead of having to open every single folder here,

00:03:16.790 --> 00:03:24.470
every single one here, read through all of the files and so on and so forth to be able to reason and then make high level declarations about your code base or folder.

00:03:24.790 --> 00:03:31.855
What your Claude NMD does, k, is it basically just compresses all of that down into a highly succinct summary

00:03:31.935 --> 00:03:40.015
of what the heck is going on in your freaking folder. So that the next time you say, hey, what was that file I made a couple of weeks ago about x, y, and z?

00:03:40.620 --> 00:03:45.580
Claude doesn't have to look through every single file in your code base. You don't have to spend a tremendous amount on tokens,

00:03:45.740 --> 00:03:58.555
and you also don't have to wait a long time. It's just sort of baked into the Claude NMD, or at least a reference to where the file lives is baked into the Claude NMD. So you can actually like reason with it at a superficial level, at a bird's eye level as opposed to actually going down through the weeds.

00:03:58.875 --> 00:04:07.835
So it's sort of like the very first thing that I'd say, you know, a Claude. Md is. The second thing that a Claude. Md is, is it's obviously your own preferences as a user.

00:04:08.490 --> 00:04:13.850
And what you'll find is, you know, more or less every time Anthropic updates Claude code, you have

00:04:14.170 --> 00:04:20.570
better and better baked in native preferences and conventions for things like, you know, delivering you file paths or

00:04:20.865 --> 00:04:27.585
how to deal with like documentation or debugging or how to update itself and so on and so forth. But obviously,

00:04:27.665 --> 00:04:29.665
Cloud Code lags behind

00:04:29.905 --> 00:04:44.210
these preferences a little bit because they have to see what users are actually using it for and and, you know, like, collect that information and figure out what ways to make things more effective. So if you're an advanced user as I am, you'll have a list of these preferences and conventions that improve your user experience.

00:04:44.450 --> 00:05:00.695
And, uh, advanced users will always have just some better preferences that kind of adapt their own workflow, as well as, you know, programming conventions, ways to organize information, structures, and and that sort of thing. Okay? So it's both a form of knowledge compression, but it's also preferences and conventions that are not natively baked in that you get to decide on.

00:05:01.415 --> 00:05:06.270
The third thing that Claude. M d is, is it's a declaration of capabilities.

00:05:06.750 --> 00:05:18.350
Now, I don't know how many times this has happened, but if you do not have a substantiated enough Claude. Md, and then you have, let's say, a skill somewhere in your your your workspace where you have just some knowledge that's sort of floating around in a few files.

00:05:18.845 --> 00:05:27.245
And you say, hey, Claude, do x y z thing for me. Go, you know, find some knowledge on x y z person or go do some research or, you know, compile a plan using x y z framework.

00:05:27.565 --> 00:05:31.600
Half the time, k, if it's not in your cloud entity, Claude will just look at you metaphorically,

00:05:31.600 --> 00:05:42.480
obviously. It doesn't have eyes yet. And it will say, like, oh, like, I don't have a built in way to do this. Sorry. What were you referring to? Do you want me to build something from scratch? I'll happy to do it. And this this sort of slowdown loop is completely unnecessary.

00:05:42.640 --> 00:05:46.905
And so what CloudMD allows you to do is it basically allows you just to itemize. Okay?

00:05:47.465 --> 00:07:08.880
You know, everything that your agent can currently do within your workspace, and you can make that really clear. You could say, hey, you currently have access to this functionality. You can do this. Hey, uh, you know, you can build a a full step plan that lasts ten or fifteen minutes and then execute it on an autonomously. In fact, that's my that's my preference or the convention that we're using. You know, you can call this API. You can call this database. You can retrieve all this information. You can act autonomously using browsers and so on and so forth. The reason why that's important is because as agentic as Claude is, hopefully, we're we're all still on the same page here about this fact, Claude still lacks a lot of agency. Okay? If you ask it to, you know, help you do something, or if you ask it how long it'll take to do something, it'll often significantly underestimate or overestimate because it's not really factoring in its own agentic capabilities. Like, I asked the other day, hey, you know, how long is would this x y z thing take to build? And then it was like, about three months or so because you would have to build this, you'd have to build that, you'd have to build that. And it's obviously like, no, I don't have to build that. I'm asking you to build it. You could build it in five seconds, so why don't you just go ahead and do Or, you know, you're having to do some API stuff and then it sends you a a little command line interface prompt and it's like, hey, just pump this into the terminal. It it sort of needs reminders that, no, I don't have to do this. That's why I'm asking you to do it. So you can actually do all of this stuff, Claude. Declaring capabilities in this way, whether it's your own personal, like tooling or workflows or whatever, or it's, you know, Claude understanding that it has the ability to do things that it might not realize at first glance is pretty important.

00:07:09.360 --> 00:07:14.865
And then finally, the fourth thing that a Claude NMD is, is it's a log of failures and successes.

00:07:15.185 --> 00:07:31.920
What I mean by this is, as you accumulate various files, as you accumulate, you know, bits of code through your project and stuff like that, every single one of these things is hard won. You didn't get them for free. Realistically, you spent tokens and then your time, which are soon to be two of the world's most valuable resources.

00:07:32.160 --> 00:07:56.475
And so because you spend all this time and energy, it is more efficient for you to take all of the learnings basically from every single piece of development or so every single action Claude does, and then insert it in its next system prompt, then just have it restart kind of from scratch every time. You know, viewed another way, mathematically, if this is the total space of all of the different possible things that Claude could do when you say, hey, do x.

00:07:57.800 --> 00:08:03.560
What this log of failures and successes is doing is it's basically carving out big chunks of this theoretical

00:08:03.560 --> 00:08:11.240
solution space. And it's saying, hey, no, you you don't do anything over here because we've already tried all this stuff over here. It's kinda looks like a planet.

00:08:11.975 --> 00:08:16.695
Meaning, the only things that you can actually try, the only things that you should try are kind of the things that exist in between.

00:08:17.015 --> 00:08:20.775
Okay. So basically, what this log of failures and successes does,

00:08:21.415 --> 00:08:23.415
is it just allows you to immediately

00:08:24.780 --> 00:08:37.260
cross out like 80% of all possible things Cloud could do because it knows. It's actually tried that in the past. And then in that way, focus its time, effort, your tokens, your money, and then your your energy on the 20% that actually matters.

00:08:37.905 --> 00:09:07.665
So these four will exist in different sections in your Cloud NMD. They'll also exist at different levels, both global and local. So what I'm gonna do next is run you guys through high ROI ways to combine these four sort of principles behind system prompts, and then apply them, um, both in global, local, and then also give you guys sort of like a a little workflow loop that you can use in order to understand how to update this effectively. And this isn't just gonna be some big long system prompt that I'm giving you guys, like, think we've probably all seen floating around various sources in the Internet. The reality is, like, Cloud NVs are highly personal devices.

00:09:07.905 --> 00:09:20.710
But these are gonna be a a list of short principles that will almost certainly help you design better projects and then get more done, whether economically or or otherwise. So the way that all this is organized within Cloud Code is using two different scopes, global and local.

00:09:20.870 --> 00:09:32.070
And if you didn't already know, basically, there are a variety of different places that Cloud Code upon initialization will look to to get the prompts that is injected at the very top of its contacts window.

00:09:32.735 --> 00:09:38.495
Okay. The two big ones for us are the user over here, which is equivalent to your global.

00:09:38.735 --> 00:09:51.030
And then also the project over here, which is equivalent to your local. And so basically, what this means is if you have a file called claud dot m d, all caps, that exists within this folder on your computer somewhere,

00:09:51.590 --> 00:10:32.770
it'll load that up on every claud code session whether or not you're working in the same workspace or another one. Now if you have a claud dot m d, capital claud dot lowercase m d, located within a dot Claude folder within your specific repository directory, then it'll also be loaded. And in this way, you know, you sort of have like a global precedent that's always injected at the top of every single thing. K? No matter what. And then you also have sort of smaller little, you know, local Claude and I mean, that's also injected. And collectively, when I say, you know, system prompts from here on out, really what I'm referring to is I'm referring to both of these. I'm not just referring to one of these. And because global is injected on every single run, there's sort of like different strategies in order to divide the four things that we just talked about.

00:10:33.415 --> 00:10:47.415
Basically, on your global Cloud. Md, it makes more sense to put high level reasoning and then your own personal beliefs. And then in local Cloud. Md, it makes more sense to insert local low level knowledge. So stuff like I just talked about with the workspace

00:10:47.415 --> 00:10:47.735
itself.

00:10:48.280 --> 00:10:55.480
So, you know, if I were just enumerating all of these things up here, okay, you'd put your preferences, like your global preferences.

00:10:55.720 --> 00:11:48.630
These could be things like, hey, you know, when you return a file, unless you return the absolute file path to click on it because whatever editor I'm using doesn't really have take that into account. You know, could be things like programming conventions. Hey, I want you to program using, I don't know, object oriented programming or hey, I want you to do like functional programming in in Rust. Hey, when I ask you to develop a new project, I always want it done in Rust as opposed to, you know, Python or or something like that. Alternatively, it could be stuff like, hey, you know, if I ask you to do something using a tool you're unfamiliar with, always go and read the API documentation first before attempting to start. Because every other time that you've attempted to do something without the API documentation, typically run out of loops, you waste x y z tokens. So make sure to load the API docs. By the way, you can't load the API docs through, uh, you know, HTML, then make sure to, like, load up a a Chrome DevTools MCP server to go and get that stuff even if it's dynamically loaded through JavaScript. Okay. So these are high level reasoning strategies.

00:11:48.630 --> 00:11:51.990
These are your own preferences. These are your own conventions. And then also,

00:11:52.230 --> 00:11:55.590
these are going to be just sort of like agency capabilities.

00:11:55.590 --> 00:12:10.365
So stuff like, hey, Claude, you can actually do x y and z. If you believe that you can't for whatever reason you're wrong, you can absolutely, you know, go and do whatever you want. The local low level knowledge. Okay. This is gonna be stuff like backslash in it, which I'll show you guys in a second. And so it's gonna be like a compressed version

00:12:10.685 --> 00:12:11.725
of all of the knowledge

00:12:12.220 --> 00:12:54.810
on your workspace. Instead of Claude having to, in the future, go through every single file, it'll just be able to read the Claude and be sort of a loose understanding, like, okay. What's where? Why have we built this? What's the purpose of this workspace? And so on and so forth. Some additional things you can do are things like context about you and your goals and your own reasoning strategies, your own communication styles. So I'm gonna give you guys examples of my own CloudNMB in a moment where you guys see that I actually give it a lot of context about who I am and why I want what I want. I'll run it through, you know, reasoning strategies that I personally use that have, you know, yielded me a lot of success in the past that may actually not necessarily be the optimal reasoning strategies, but which I tend to understand. And because I'm communicating with this thing in every freaking every five seconds nowadays, I'm I'm better capable of understanding what it's putting across if we use those principles.

00:12:55.325 --> 00:13:54.015
And then, yeah, those high level preferences and then generally good token conservation strategies. Whereas with the local, you know, it's a description of the project where everything is, low level preferences, like specific API docs and usage. If you are using, you know, the Go high level API to do some project or whatever, you can actually just, like, have the whole Go high level API existing within your project. That'll minimize the number of tool calls that, um, Claude has to make to, you know, some sort of research sub agent go and do the thing for you. Instead, it can remain local, reduce total token usage, and then also just be faster and then more accurate, and then capabilities within the project. And then that takes me to the local workflow. So and then that takes me to workflow. So there's two sort of workflows here that I wanna talk about. There's the local workflow, and then there's the global workflow. And the local workflow is gonna be responsible for updating our local Cloud. Md. And then the global workflow is gonna be responsible for updating our global Cloud. M d. Like, it'd be nice if I could just give you on a silver platter a bunch of stuff to put in your Cloud. M d. Right? I think that's what a lot of people want. But you're gonna end up a much better both developer and then a much more productive person if you understand the principles at play here and develop your own. So initially, to start,

00:13:54.415 --> 00:14:15.560
anytime you're developing anything in in Cloud Code or whatnot, obviously, you need to plan your feature. And I say feature here loosely. You know, I use Cloud Code as basically like my business assistant nowadays. And so I use it to do anything from reading my emails, to grab me news summaries in the morning, to to communicate with x y z people, to design me, you know, websites and so on and so forth. So feature here is really loose. I'm not just talking like about a vibe coded project. I'm talking about anything.

00:14:15.800 --> 00:14:25.725
But what you do is you start by planning a future. Right? And then if you think about it logically, what Claude does next is it instantiates the future. However, over the course of planning and instantiating,

00:14:25.725 --> 00:14:28.125
okay, it will fail a bunch.

00:14:28.525 --> 00:14:34.850
It'll also succeed a bunch of other times. And ultimately, there'll be a giant list of learnings

00:14:35.010 --> 00:14:58.755
between, you know, step one to step two. And so what you do after you instantiate is you actually compile all those learnings, k, into some efficient high information density thing that doesn't seem a lot of tokens, then use that to update the Cloud. Md. And so this is your local workflow for managing your system prompt. And you basically just do this every time. You plan something, it'll do a bunch of failures in the way, then you'll instantiate it, you'll take all those learnings, update your CloudNMD.

00:14:58.755 --> 00:15:25.475
That way the next time you plan a feature, it'll already have all the benefits of the failures plus, you know, any additional things that are learned along the way. And so the first time around this loop, you know, it might take, I don't know, let's say x time to develop a feature. The second time around this loop, you know, maybe to take like 0.9 x because now, you know, you've shaved off 10% of the the the search space and it's a lot faster. The third time you go, maybe it takes 0.8 x. Okay. And so like the time will just get faster and faster and faster every time until eventually you develop things.

00:15:25.715 --> 00:15:29.155
Using Claude in a similar way that you would develop if you were not using Claude.

00:15:29.715 --> 00:15:43.820
Now here's where it differentiates between the global workflow. Workflow. What happens is, you know, as you accumulate a variety of failures, successes, and learnings, so on and so forth, your current local cloud and it gets really, really good. After all that's done, what you do is, you know, after hundreds of these runs,

00:15:44.375 --> 00:15:55.895
k, you can either pull a slash insights feature or you can run that yourself to show you guys how to do. What this will do is this will compile, not like at a local level, but at a global level, all of the things that Claude attempts

00:15:56.215 --> 00:15:58.855
pretty consistently and then struggles with pretty consistently.

00:15:59.400 --> 00:16:16.435
You know, it's like, oh, hey, I noticed that not only on that one project, but also in more or less every project, Cloud consistently goes down silly rabbit holes it doesn't need to, and then tries coming up with its own stuff instead of just consulting the docs. And so, you know, after this is done three or four times, obviously, there's a trend. Right? So what you can do is you can take that information and then you can pump that in your global.

00:16:16.835 --> 00:16:47.935
After that, what I'd recommend is is I'd recommend you manually review because Claude is an agent at the end of the day. And the more AI steps you have, the more you compound probabilities and the less likely that it becomes that Claude itself is making, like, the right call. You know, if, like, Claude is independently 90% successful on a task, and then you give it to another Claude, which is 90% successful to a task, and then you give it to another Claude, you know, what you're really doing mathematically is you're going, um, 0.9 raised to the three. And if you just do a little bit of math there, that's not 90%. Right? 0.9 to the three is 73%.

00:16:47.935 --> 00:17:19.675
And so I guess what I'm trying to say is, um, the more steps you have without a human in the loop here, uh, the lower the likelihood that your total determination will be correct. And because this is your Cloud. M d, it is your global preference and convention file, it will be applied to every future project. Meaning, if there is a place you should spend human time on, it is this exact step here. So I'd recommend manually reviewing that. Once you manually review that, then you can add some har high ROI bullet points to your Cloud. M d and so on and so forth, You know, just like a high information density version, and then you can actually update the the Cloud NMD. And then you can repeat this loop a few times if you'd like before finally going back to the local loop.

00:17:19.915 --> 00:17:25.435
And so, I mean, it's kind of like a I don't know what you wanna call it, an infinity sign. K? Kinda starting here, you're going

00:17:26.600 --> 00:17:34.840
kinda like this and then you're kinda looping back and then you're just doing this over and over and over and over and over again. Obviously, you're gonna spend a lot more time in this loop, but eventually, you're gonna go down to this loop.

00:17:35.160 --> 00:18:12.705
And this is how I personally develop using Cloud NMD. This is why my workspaces are super tight and then instead of me, you know, using a vanilla version asking it, hey, do go do x y and z, and then it like stumbles around, uses 20,000 of my tokens and God knows how many of my dollars. And when I say, hey, I'd like you to do x y and z thing, I'd you to scrape some leaves or whatever. It already has all that stuff baked in while still being flexible enough that I could change them anytime that I want. Okay. So next, I'm gonna show you guys basically my workflow every time I start with a new project. Assuming that I've already done a little bit of work in the project, I don't have a cloud dot m d and I don't really have any of that like advanced tooling or system prompt harness and stuff set up. This is exactly what I would do step by step. So first of all, you need to open up a folder. I was just learning about Tomatillo's earlier. That is sort of embarrassing.

00:18:13.105 --> 00:18:24.990
But anyway, in anti gravity, just go open recent, and then I'm just gonna open up something. Why don't I do, you know, anti gravity example right over here? And, you know, when I'm in this folder right over here, obviously, there are a bunch different files and, you know, configurations.

00:18:24.990 --> 00:18:30.030
This one's using Gemini for a while. So what I'd like to do next, I'd like to open up a Claude code.

00:18:30.350 --> 00:18:35.470
And so I'll click on that button over here. Let's close out the agent window because I'm team Claude, at least for the moment. Thank you, Space Invader.

00:18:36.025 --> 00:18:45.465
And really, like, the first thing that you do is, you know, you you develop on your own. I always recommend just, like, don't try baking in any opinions into a Claude NMD until you've at least developed without

00:18:45.625 --> 00:19:04.765
a Claude NMD or some sort of advanced system prompt for a little bit. And the reason why is because, like, you'll find Cloud's actually really good out of the box. As mentioned, they're incorporating more and more of these features natively within it. And so, like, it's it's great. It's not like the harness that makes the intelligence. It's obviously intelligence inside of it that sort of, you know, communicates with your system prompt to to get good. But right now, it's already fantastic.

00:19:04.925 --> 00:19:24.720
Anyway, after you've done some developing for all, this is obviously some sort of website here. It's like a template using VIT. Just go slash in it, just like that. And basically, slash in it will go through, read every single file in your workspace, which I'm currently doing with fast mode, if you're wondering why this is probably faster than what you're doing. And then at the end of it, it'll come up with basically like a highly optimized Claude dot m d file that

00:19:25.120 --> 00:19:38.235
succinctly and effectively summarizes the placement of everything here. And you can see it just generated one called claud dot m d. So comes with the built dev lint commands, note that no test framework exists, some architecture review key dependencies and their roles,

00:19:38.395 --> 00:19:55.650
then some style conventions as well. So now I'm gonna open up this claud dot m d. Okay. And why don't we just move this over to the main window, so it's a little bit easier to see. And you can see that more or less it it just at a very high level summary takes every single line in my entire workspace, then it just significantly increases the information density

00:19:55.890 --> 00:19:58.210
at a cost of like total comprehensiveness.

00:19:58.290 --> 00:20:00.210
So what I have now is I is I have a summary of everything.

00:20:00.905 --> 00:20:06.505
So that means is the next time that I ask Claude anything about my workspace, k, the next the next go around,

00:20:07.305 --> 00:20:14.105
I don't actually have to like have it like run through every single thing in the file. Like for instance, what I'm gonna do here is I'm just gonna call this like,

00:20:14.800 --> 00:20:17.280
I don't know, xyz.md.

00:20:17.280 --> 00:20:19.600
Or actually, you know, why don't I just delete this for now?

00:20:20.480 --> 00:20:23.440
You know, if I had asked this Claude version something

00:20:24.000 --> 00:20:38.385
about dark mode, hey, what are my opinions on dark mode? It's It's gonna check its memory for notes on the preference. It's not gonna find anything. And notice how it's just gonna say there's there's there's nothing at all. So what I could say is read through whole project and find my preferences.

00:20:38.945 --> 00:20:47.580
And now what it'll do is it'll, know, essentially launch some sort of a gentic search with readmes and so on and so forth until it finds something about dark mode. In this case, was in the Gemini. Md.

00:20:47.820 --> 00:20:52.860
But I want you guys to know that, you know, whether or not you have it in a Gemini. Md or it's just sort of written and eventually figure it out.

00:20:53.260 --> 00:21:02.165
Now, the issue is, you know, how what what sort of usage did we just do in order to get that? If I just scroll all the way up here, type slash contacts,

00:21:02.325 --> 00:21:06.645
you know, the system prompt was point 6%, free space was messages is 0.9%.

00:21:06.645 --> 00:21:12.620
So that last message chain there with the tool calls and everything like that might have realistically taken like five or 6,000 tokens.

00:21:12.860 --> 00:21:17.420
I don't need to do that sort of thing ever again. You know, if I bring that back and go claw.md,

00:21:17.420 --> 00:21:22.060
and then if I just open up a new instance and I say, hey, you know, what are my opinions on dark mode?

00:21:22.460 --> 00:21:24.300
Obviously, it's gonna read the claw.md.

00:21:24.485 --> 00:21:28.805
And, you know, instead of me having to use god knows how many tokens, if I go back to slash context,

00:21:29.045 --> 00:21:59.515
you know, you'll see that I've now you only use point 2%. So basically, save myself what's that? Like, 6,000? And let me tell you, these Cloud tokens ain't free, man. Tropics increased in the price pretty aggressively, especially recently when they realized 99% of the world is now operating using their infra. So I guess what I'm trying say is I'm spending, like, literal, like, money, but I'm also spending time. And to me, the bigger thing is time. But what are some other things asked? I mean, like, think about deployment. If you have any sort of like front end or full stack experience, you'll know like, you know, usually the flow is you start with a dev server. You use that dev server via NPM run dev or equivalent to like figure things out on your,

00:21:59.915 --> 00:22:27.345
you know, develop various features and so on and so forth. Then you'll build, you'll do some sort of linting, and then once you're done, you'll actually, like, preview it. You'll you'll push to production or or sorry. You'll push to staging and then verify that and then eventually you push to production. Right? Like, obviously, this is something that it could have learned just by going through the folder structure, seeing source, public, node modules, all these things. But, you know, I'm just listing them out over here so that instead of you having to actually read any of that filing or tooling, you know, you can do it in God knows how many what sucks? Five tokens, six tokens, or something immediately.

00:22:27.745 --> 00:22:37.320
Likewise, you know, I see where things are laid out. So in this case, this is obviously a single page application. The entire app lives in a single component, Nav Hero services, projects, and footer sections, markup and logic is here.

00:22:38.040 --> 00:22:45.000
It is evident if you were to actually click on app.jsx and then scroll through that that is the case. But look at how many more tokens app.jsx

00:22:45.000 --> 00:22:47.645
is versus, you know, just that brief little description in

00:22:47.885 --> 00:22:57.645
CloudNMD. If I were to copy and paste the entirety of this into something like a word counter, you could see it's 827 words, approximately 1,100 tokens. K. If I go back to my CloudNMD, like, long is how long is this?

00:22:57.965 --> 00:23:11.980
It's 22. So that's a what? 45 x compression ratio? That sort of compression is how you ultimately get a significantly better and more effective clot because you are not shoving a tremendous amount of tokens at the beginning of any query. And, you know, as we hopefully know,

00:23:12.645 --> 00:23:37.200
token length tends to scale inversely with the quality of the output. The more tokens in a context window, not only the more money are you spending, but typically the lower quality the results are. So just avoid all that by initializing and then storing a bunch of information about, you know, what the project is on. You'll be you'll be much happier for that. But, know, slashing it isn't the only thing that I would do. From here, I'd actually start importing a couple of my preferences and then things that it's tried. So I don't know. Let's just say I'm gonna remove the gemini.md

00:23:37.200 --> 00:23:37.760
for simplicity.

00:23:38.255 --> 00:24:18.405
Let's just say I'm developing a new feature and actually, why don't we just visualize app IDK what it looks like? Let me actually take a look at this thing. So it'll run the dev server so I could see it in the browser. And immediately, I'm thinking like, hey, know, this is actually kind of inefficient. When I say visualize app, I basically just want you to launch So store in your cloud.md that when I ask you to run the dev server or open the app, I just want you to open it in my Chrome instance as well. I don't just want you to run the dev server. You know, basically, next time I say this, I don't want it just to like say, hey, the dev server is here, give it a click, and then I'm like, okay, can you just open it because I'm already here. I just wanted to open it automatically. Right? Okay. Cool. And I see, you know, it's kind of over here. So that's nice. Definitely not a fan of the design. I don't like how it scrolls through. I'll just say, hey, I want you to significantly improve the perceived visual quality of the application. Go and look up,

00:24:19.090 --> 00:24:25.570
you know, the Apple website and then compare that to our website. Make some changes that just improves both the perceived visual cohesiveness,

00:24:25.570 --> 00:25:08.865
quality, etcetera. Must have been a Gemini website. Anyway, let's see what it does here. It's fetching Apple site for design reference, reading the current app code in parallel. And now it's just doing a bunch of updates, editing the CSS, nav dash link, hero dash background, and so on and so forth. And you can see that it is actually updating the site. I mean, it's doing it currently in real time, but it's looking significantly better. It's also picked up some new colors and so on and so forth. Now, what's cool is it actually just opened up a second project for me right over here because earlier on, I'd stored my preferences that I don't want it to just give me the link. I actually want it to, you know, open open stuff up. So that's nice. Obviously, we have better designing and stuff like that. But the key part here is when I say, okay, great. Nice job. How could you have arrived at these conclusions and done everything I just asked you to do faster? Okay. And now look, we're already at the point where we're capable of optimizing a fair amount of these design issues.

00:25:09.105 --> 00:25:25.020
Instead of 20 edit calls, which is what it did before, what it could have done is just do one write call. So the reason why that took like thirty seconds or so, because it was editing the CSS file line by line across 20 sequential tool calls. I should have read the file, rewritten the full thing in my head, done a single write to replace index at CSS in one shot.

00:25:25.420 --> 00:25:36.295
Yes. I'd like you to save this in the local cloud dot m d. Do it as a user preferences section. So asking it questions like, how could you've arrived at those conclusions and then everything I just ask you to do faster and for fewer tokens is pretty powerful.

00:25:36.535 --> 00:25:42.215
Doing this consistently as you develop and design a project and then having a running log of changes to the Cloud NMD is also quite valuable.

00:25:42.790 --> 00:26:02.255
Another thing you can do is you could set a meta prompt in the Cloud NMD, and that's personally what I always do. That basically says, like, when you have made a mistake, I want you to update the Cloud NMD with a running log of things not to do next time. When you've made a mistake, I want you to update the Cloud NMD with a running log of things not to try next time. Essentially, want this to be almost like a mini experimenter's

00:26:02.255 --> 00:26:08.495
or research person's notes that shows what a future Claude instance should not do while

00:26:08.575 --> 00:26:17.280
working on this project. Update the Claude. Md to reflect what I just said at the very bottom. K. Now it has a section called lab notes, what not to do. This is going to show a bunch of failures,

00:26:17.280 --> 00:26:30.965
as well as learnings and successes and so on and so forth. And we're already honestly, like halfway down the loop. Now, is a very contrived example because I'm literally just building a website. But imagine that, you know, instead of just a website, you're building a workspace that is meant to contain all of your business,

00:26:31.125 --> 00:26:47.500
basically, entirely. All of your SOPs, it's meant to contain all of the work that you do on a daily basis. It's meant to contain your to dos and so on and so forth. Having information like what I just showed you for this project would be invaluable across more or less all levels of both development and then also productivity. And that's personally what you should ultimately be working towards.

00:26:47.980 --> 00:27:07.185
So, anyway, we can make this as complicated as we want, obviously, but hopefully, you guys see that loop at work. We plan a feature. So we just did this. It was simple enough that we didn't need to use a dedicated plan mode, but obviously, I still one shotted it. After it implemented the feature, along the way, it did a few things that realistically could have done better. So what do we do after? We take those learnings, we compile them, and then we update the cloud.md.

00:27:07.510 --> 00:27:13.750
And this was sort of a med example since I literally was doing it while I was building the cloud. Md. But hopefully, you guys at least understand conceptually

00:27:13.750 --> 00:27:37.610
of sort of what you do. After four or five of these runs, there's probably a fair amount of stuff here that you can take advantage of. And that's where an insight run would make sense. So let me actually zoom in and then just delete this so you guys could see. In case you didn't know, insights is a simple slash command that basically runs a bunch of sub agents across all of your cloud conversation history. The benefit to that is now, not only are we running, you know, and and changing our local cloud NMD,

00:27:38.090 --> 00:28:08.050
we're also evaluating all of, like, the patterns in communication that we've had with cloud NMD over the course of the last I don't know. Could have been, like, few days, could have been months, could have been, I mean, years, depending on how soon or late rather you are watching this video. So, um, just like we optimized our local cloud in MD, now we can start optimizing our global. And while it's chewing away, because insights does take a fair amount of time, k, I'm just going to create a new file here. I'm a call it global cloud dot m d. And I'm just gonna give you what I would consider to be, at least as the time of this recording, probably like some of the higher ROI principles

00:28:08.050 --> 00:28:09.410
to make sure to include.

00:28:09.650 --> 00:28:15.625
I include this in my own global cloud in m d because I think it's just very, very valuable. So I'll say global Cloud NMD.

00:28:16.105 --> 00:28:25.465
This is inserted at the beginning of any conversation with Cloud across all of the users' workspaces. So first, I have a profile section. So this is a bit about Nick.

00:28:26.120 --> 00:28:30.600
So, you know, I don't know what it'd be like. Nick is a 30 year old,

00:28:30.760 --> 00:28:32.040
and Jay,

00:28:33.240 --> 00:28:34.360
high performing

00:28:35.160 --> 00:28:37.080
Internet entrepreneur.

00:28:38.095 --> 00:28:42.335
He runs a YouTube channel at 350

00:28:42.495 --> 00:28:48.255
better be 350 by the time I make publish this video. 350,000 subs on Instagram channel,

00:28:48.975 --> 00:28:50.255
and so on and so on and so forth.

00:28:50.830 --> 00:29:03.710
K. And so I have a bunch more information which I've taken from just a couple of other systems I've built. This one here is Nick is a 30 year old and Jay. Here's his revenue, so here are all the different things that contribute to my revenue. Here's some churn math,

00:29:04.270 --> 00:29:06.110
some of the companies that I'm currently owning,

00:29:06.925 --> 00:29:46.595
some teams. Right? So it's me. It's an editor. It's a LinkedIn newsletter person. That's a bunch of AI agents. Bunch of information on YouTube as well as my goals, and then ultimately some on Instagram as well. And you're thinking like, Nick, this is crazy. Why would you insert all this information in your global Cloud NMD? Well, the reason why is because I want this on every conversation that I have with it to understand who I am and to take that into consideration when discussing things with me. I can't say how many times I'm having a conversation with Cloud and because I don't have context like this, because I'm in a naive thing with no personal system problem checking the context window. I say something along the lines of, hey, what's the best solution for x y and z? And then it says, oh, you're gonna wanna do this solution. And then I say, why? And then it'll say, oh, because it's the cheapest. Right? It only cost 0.2¢,

00:29:46.595 --> 00:29:48.515
whereas the other other solutions cost $5.

00:29:48.900 --> 00:29:56.260
And I'm thinking, well, if you knew a little bit about who I am, you'll know that money is not the primary bottleneck right now. I prefer you to exchange my my money for my time.

00:29:56.740 --> 00:30:06.865
So just giving it some, like, high level principles like that is is very important. Anyway, while I was doing that, the actual shareable insights report is ready. So I'm just gonna tell it to open it so I can take a look at it with you guys.

00:30:07.425 --> 00:30:15.985
And now you'll see there's an HTML page basically that runs through everything about Claude, all of the insights across all of the sessions. Looks like 1,849

00:30:15.985 --> 00:30:21.450
messages across 200 sessions. I don't know where this chooses the cutoff. It looks like it's like about a month or so.

00:30:22.250 --> 00:30:35.105
Although, keep in mind that, like, this is clog code specific, and I don't know if this encapsulates all the conversations I've had with them on the desktop app, but pretty good. And you can see here that, you know, there's a bunch of context about what I work on and and so on and so on and so forth and how I use it and all this stuff.

00:30:36.145 --> 00:30:40.385
So the the important thing to do is existing features to try section.

00:30:40.545 --> 00:31:01.180
You can just copy this in the Cloud Code and add in your Cloud NMD. So for instance, when using Chrome DevTools MCP or browser automation, always kill stale Chrome processes in a clear profile before starting. If the m c p tools fail twice, stop and ask the user for continuing to retry. Never waste tokens on repeated failing browser calls. This is actually quite valuable just given how many times I have tried to have it run, you know, Chrome dev tools m c p and it's failed.

00:31:01.580 --> 00:31:02.700
Um, same thing over here.

00:31:03.545 --> 00:31:22.520
Same thing over here, you know, with some face swap information and stuff like that. You can copy all this in the cloud and it'll set it all up for you, which is pretty valuable. As well as it it can even go and build like new skills based off of things that you consistently ask. So that's that's more or less what I'm doing here. Anyway, the the value with this is basically to like copy the entire thing, go back here, paste it in,

00:31:23.240 --> 00:31:53.210
and say, this is my claw and insights file. It describes at a high level a few of the obvious design patterns in my thinking, and then a couple of the issues that I've had communicating with you and other versions of you. I'd like you to distill this into a list of high information density snippets that I can paste into a global Cloud. Md to be both token conservative, but also avoid most of the mistakes that you typically make. And I'll just press enter, it's going to give me some information about that. And over here, actually have the changes and this is very high information density. Right? It basically took a bunch and said, don't over explain, over engineer, add un requested

00:31:53.610 --> 00:31:54.410
improvements.

00:31:54.490 --> 00:32:00.335
When making widespread changes to a file, he used one right instead of many sequential edit calls. Speed matters, don't fetch well known websites.

00:32:00.655 --> 00:32:01.935
Again, a rerun,

00:32:01.935 --> 00:32:23.840
browser automation, and, you know, so on and so forth. Just some just some high level stuff. It looks like it just inserted that in here, which is quite nice. So now, what do we have? We have, if you remember, some context on me in the global Cloud ID. We also have some high level reasoning rules and principles. And really, what we're just missing is some token conservation strategies. And you could see this by you know, you can go back,

00:32:24.525 --> 00:32:39.630
rewind the video if you'd like some more on that. But basically, you want context about you, your goals, and your reasoning strategies, some high level preferences about, you know, what it is that it is currently doing that is wrong that you would like it to fix, and then some good token conservation strategies like docs first. So what I'm gonna do is underneath

00:32:39.630 --> 00:32:45.790
interaction rules, I'll also just say oh, and what's really interesting that I'm seeing, one of my rules are actually directly contradicting

00:32:45.790 --> 00:32:49.070
some of the other rules. No fetch, well known sites. I'll actually just remove that.

00:32:49.550 --> 00:32:58.895
That's the human in the loop part. Right? Just look to see if any two rules directly contradict each other. Then I'll say, when a user asks you to use a nontrivial platform,

00:32:59.375 --> 00:33:07.270
one for which you do not have context in always look up the documentation first, You can do so by looking into API documentation

00:33:07.270 --> 00:33:08.710
plus platform name.

00:33:09.190 --> 00:33:13.270
After, if for whatever reason you can't access the docs for JavaScript reasons,

00:33:13.270 --> 00:33:25.205
launch a Chrome DevTools MCP Chrome instance so that you could still copy and paste all that data. No matter what, if you're working on a project for whom API documentation is available, you should always go through the API documentation to avoid 99

00:33:25.205 --> 00:33:31.285
of the errors. The tokens we spend reading the docs will save us a lot of tokens in trying to use things that don't work. Cool. So I'm gonna copy that.

00:33:32.200 --> 00:33:48.025
And now I have my global cloud NMD. And, you know, I could obviously just have Claude actually insert that into the global cloud NMD. I could also just, like, go and find the find the finder. So I'm gonna go to finder on Mac. Basically, you can find your global cloud NMD just by going to your Mac, in my case, users on my Nixarayef.

00:33:48.105 --> 00:33:53.465
And then there's a hidden folder here, which you can't actually see just right out of the gate. You should be able to go

00:33:54.425 --> 00:34:13.760
shift command. I think it's comma or period. There you go. Shift command period. Once Once you're done with that, you can scroll all the way down where it says Claude. And then over here, you'll see that there is a Claude dot m d that lives within that Claude. So what I can do now is I can just reveal this folder in my finder, compare it to that folder in my finder, and I can actually just go drag and drop this in.

00:34:14.705 --> 00:34:17.425
I have global cloud and I can just remove this cloud

00:34:17.745 --> 00:34:37.380
and replace that with this cloud. Awesome. So now all future conversations that I have with cloud across all of my workspaces and all of my folders will include the information that I just provided. And hopefully, you guys see how simple it is to run that loop. Granted, this is an informal loop. I'm not really showing you guys like a simple formal streamline process, but hopefully, you see how easy it would be to build that in again as like a meta clod.

00:34:38.660 --> 00:34:41.140
Let's talk a little bit about agent harnesses.

00:34:41.325 --> 00:34:42.845
So agent harnesses,

00:34:42.845 --> 00:34:58.060
the term anyway, has gotten a ton of interest over the last couple of months because it's sort of new and exciting, but very few people actually understand what it refers to and what it means. An agent harness, to be clear, is just Claude code. Claude code is the harness around

00:34:58.060 --> 00:35:00.140
the model Claude

00:35:00.140 --> 00:35:05.820
that enables it to do things like call various tools and get actual economically valuable work done.

00:35:06.220 --> 00:35:07.660
For those of you that don't know,

00:35:08.300 --> 00:35:21.015
all that like AI models are are just text interfaces. Right? It's just text in text out. A harness is what turns something that can only communicate in text into something that is ultimately capable of like controlling our computer.

00:35:21.415 --> 00:35:24.055
So the way that I personally think about the question,

00:35:24.055 --> 00:35:25.255
what is a harness

00:35:25.495 --> 00:35:26.135
is

00:35:26.510 --> 00:35:31.070
harness is just everything that wraps around the LLM that is not the actual LLM itself.

00:35:31.630 --> 00:35:43.975
So in our case, it's Cloud Code. It's the system prompt. It's the hooks. It's the tools that it has access to and it's the parameters they're in. The control things like when the memory auto compacts,

00:35:44.055 --> 00:36:00.830
how many messages you can send in a turn, what the total number of token limits are and so on and so forth. For the purposes of this demo, let's pretend that this server here is our Claude space invader. And so this is sort of like the the the large language model itself. This is actual Claude.

00:36:01.390 --> 00:36:08.750
And so Claude is obviously like a galaxy brain intelligence. It's been trained on God knows how many books and blog posts and encyclopedias

00:36:08.750 --> 00:36:20.405
and so on and so forth. But you know, Claude is sort of it sort of exists in this boundary where it can't actually do anything outside the real world unless it's given the tools and the ability to do so.

00:36:20.725 --> 00:36:21.525
And so,

00:36:21.845 --> 00:36:22.485
one example

00:36:22.880 --> 00:36:27.280
of things that Claude has access to are set tools.

00:36:27.520 --> 00:36:50.515
So that's things like, I don't know, the ability to use bash, like, use a terminal. The ability to use, I don't know, grep, which is how it finds things around your computer and so on and so on and so forth. Another thing that it has access to, kinda going back and forth, is some form of memory. Right? What it can do is it could read, so it could read things that are stored in this memory, and then it can also write, so we can add sort of update things as needed.

00:36:51.150 --> 00:37:04.590
You know, there's obviously also a variety of other things here that it has access to. And, you know, if it didn't have access to all these things, again, it would just be like an agent or a a model, sorry, that exists in the box. And so that's really the difference between, you know, LLMs and agents.

00:37:05.345 --> 00:37:06.225
Agents

00:37:06.385 --> 00:37:07.985
are LLMs

00:37:07.985 --> 00:37:09.665
plus a harness,

00:37:09.825 --> 00:37:13.265
whereas LLMs by themselves, large language models, they can't really do anything. Obviously

00:37:13.905 --> 00:37:32.130
operate entirely in the domain of knowledge. So just given the fact that it's called a harness, you can kind of think of it as, you know, I'm gonna draw a really crappy dog here. Put another way, here's a really crappy rendition of why I initially wanted to be Canadian dog sledding and what ended up being looks more like Santa with a big fat beard riding a

00:37:32.450 --> 00:37:33.810
questionable reindeer.

00:37:33.970 --> 00:37:39.785
But basically, you can imagine that like this right over here, this is your LLM. This is the actual model intelligence.

00:37:39.865 --> 00:37:53.720
And then you over here, okay, this is your harness. This is actually like the the code part of Claude code that sort of controls it. And so the LLM wants to go in a bunch of different ways and wants to do a bunch of things. What the harness does is it just sort of narrows down its direction.

00:37:53.960 --> 00:38:02.680
And, you know, you can kind of almost think of it like the barrel of a gun or something like that. Right? Whereas, you know, back in the day, you might have had, like, cannons

00:38:02.680 --> 00:38:05.160
and then you might have loaded those cannons with big

00:38:06.205 --> 00:38:20.685
massive cannon balls and they're huge. And what you do is you'd stuff some additional gunpowder underneath and stuff like that. And those cannons would kind of and despite the fact that they were operating off the same fundamental technology, which is gunpowder, they might not really be able to go so far. I don't know. Let's just say 50 meters.

00:38:21.220 --> 00:38:22.580
Nowadays, obviously, we have

00:38:23.380 --> 00:38:33.780
this is my really crappy gun drawing with, you know, more or less the exact same technology. You put some sort of bullet in there. Right? But then because of the technology that surrounds

00:38:33.940 --> 00:38:36.260
the core thing, which is the gunpowder,

00:38:36.260 --> 00:39:06.375
you know, the bullet can go a lot farther. So maybe instead of 50 meters now, it can go, I don't know, 250 meters or so. So this is how I this is how I think about harnesses. Okay? And I don't mean to just show you a bunch of silly grade school analogies, but it is important to realize it like that is what now Claude code really is. And because Claude code is a harness, obviously, are a bunch of other people that have tried making their own harnesses as well. Just like we have frameworks like React and Vue and then Next. Js and and and Nuxt, we also have a bunch of different harnesses that have been developed that supposedly work on and then improve on on specific aspects.

00:39:06.695 --> 00:40:11.820
What are some of those aspects? Things like security. Right? Automatic permissions. So plan mode versus default mode versus the new enable auto mode and then bypass permissions mode. You know, there's some harnesses out there. Okay? There's some AgenTek SDKs and stuff like that. I'm not gonna name any names, but there are some of them that are probably a little bit less secure than others. Such that if they were to read a Twitter thread that looked like this, maybe they would actually execute pseudo r m dash r f and delete your entire hard drive. Right? Bunch of examples of people screwing around with us. This is an example of codex, which, you know, being an extraordinarily competent model, I can't really talk down too much on. But this is an actual conversation that, you know, it had with somebody that I found on Twitter. You know, the model basically tried running something that was like r m dash r f, which to make a long story short, in case you didn't know, just deletes everything. And here it says, well, the shell policy actually blocked the raw RMRF. So what I'm doing is I'm removing those generated directories, like, in the shell policy with a Python cleanup instead. Same effect, less policy friction. Right? It's just gonna go end up deleting the entire thing. You know, the the harness impacts a model's ability to get things done. It also impacts ultimately the safety.

00:40:11.980 --> 00:40:22.785
It impacts like the memory and so on and so on and so forth. And so, hopefully, at least now you guys understand what the harness is before I show you guys some examples of different versions of it. Obviously, Cloud Code is the major harness today,

00:40:23.105 --> 00:40:29.105
but there's a great blog post over here by Langchain that more or less describes a way to create different harnesses.

00:40:29.425 --> 00:40:34.305
The model gets a certain type of context injected into a prompt's memory skills or conversation.

00:40:34.650 --> 00:40:39.850
Then you also have orchestration, things like Ralph loops, which was really big a while back. That was a different type of harness.

00:40:40.330 --> 00:40:46.650
You know, there's a certain persistence of data, actions, and then the ability to both observe and verify, say with screenshots and stuff like that.

00:40:47.975 --> 00:40:52.055
One harness that a lot of people are using now is this sort of Droid idea,

00:40:52.215 --> 00:41:05.010
which shows built by Factory AI. So Droid is like a publicly available harness that you can run and download today. Py dot dev is also exploding in terms of popularity. So whereas Claude code, you know, obviously needs to run with Claude infrastructure.

00:41:05.010 --> 00:41:15.970
Right? Claude is the model underlying Claude code. This Py coding agent is sort of like the open source provider of it. You can feed in more or less anything that you want, including Claude, and then just have it operate inside of this this harness.

00:41:16.535 --> 00:41:43.590
And, you know, what this does is it just changes the way that we store memories. It changes the way that we store certain files. It sort of like modifies. It's almost like an alien or bizarro version of Claude code. And so far that it changes a few of like the fundamental constants, like how long before context compaction, you know, how do we try different types of solutions and stuff like that. Various baked in behaviors regarding a cloud code and and so on and so forth. And the reason I'm covering this is because, you know, this is something that was very fundamental to Anthropic. Back in 11/26/2025,

00:41:43.590 --> 00:41:53.505
they wrote a big long blog post called effective harnesses for long running agents, which at the time kind of changed the game. And I would say this is the beginning of the kickoff of ClaudeCode superiority over most other harnesses.

00:41:53.825 --> 00:42:23.175
And so, you know, here it describes various different ways to work on long running coding projects and manage environments and stuff like that. And so obviously, this is something that's like very fundamentally baked into Cloud Code. If you wanna understand Cloud Code in an advanced level, uh, you can't get better than getting it at a harness level. Okay. So, you know, obviously, this is a Cloud Code course. It's not another harness course, but you should at least know what agent harnesses are before we proceed to the rest of the course because, you know, the more understanding of harnesses you have, I think the better you'll be able to appreciate and then digest and ultimately execute on what I'm about to show you.

00:42:24.615 --> 00:42:27.415
Next, I wanna chat a little bit about parallelization,

00:42:27.495 --> 00:42:37.460
about things like agent teams, about sub agents, and a couple of other ways of distributing work to minimize the amount of time and effort that goes into things, while also increasing the quality of the output.

00:42:37.860 --> 00:42:40.980
Okay. So I say agent teams here, but let's start with parallelization.

00:42:41.060 --> 00:42:48.445
A big question that I think a lot of people have is, well, first of all, what the heck is parallelization? Which is just doing multiple things simultaneously instead of waiting for sequential

00:42:48.445 --> 00:43:11.100
things to finish. And then the second one is like, Nick, why the hell should we paralyze our agents to begin with? Into that, I say, have you ever, you know, sent a long running task request to Cloud Code and actually had Cloud execute on something for more than a few minutes? For the vast majority of the time, you're just sitting there twiddling your thumbs. Twiddling your thumbs is not very economically productive, so if I have ways to not twiddle my thumbs, I will do so. And I really, I guess, mean is that autonomous agents just take a long time to finish tasks.

00:43:11.675 --> 00:43:21.915
You know, when we started with this stuff, or at least when I started with this stuff last year, you know, Claude could realistically work on things for thirty seconds. The other day, I had Claude work on something for over fifteen minutes.

00:43:22.395 --> 00:43:45.735
And so if all I'm doing is just sitting there waiting for it to do this fifteen minute task, you you can imagine that my productivity is basically going to be punctuated by me just sitting around watching it. It does something. I get the result, make some minor changes, wait for another fifteen minutes and so on and so forth. That's not very efficient. So a parallelization allows us to reduce the total amount of time by a factor of at least a few from fifteen minutes to maybe a couple minutes, so it'll be able to work on smaller, more more self contained things.

00:43:46.135 --> 00:43:48.215
But two, it'll also just get higher quality.

00:43:48.455 --> 00:44:06.510
Another thing is that many tasks feature independent steps that can be broken down. So for instance, let's say I'm doing some sort of task. Okay, and this is just like how long it would normally take if we go serially. And so the option a is just to do what most people do, which is where they'll do, I don't know, they'll do step one and then they'll do step two and then they'll do step three and then they'll do step four.

00:44:07.325 --> 00:44:09.485
So that's one, two, three, four.

00:44:10.285 --> 00:44:14.525
This task over here takes five minutes. This task here takes

00:44:15.165 --> 00:44:21.165
This task over here takes five minutes, and this task over here takes five minutes. What's the total amount of time kinda collectively?

00:44:21.450 --> 00:44:30.170
Well, it's twenty minutes. Right? So that's sort of a, you know, the serial way that most other people do things. Well, guess what? Turns out a lot of tasks don't need to necessarily be like that.

00:44:31.370 --> 00:44:45.135
If I just copied all of this stuff over. K? And then instead, ran a couple of these in parallel. So I actually had, I don't know, three of these simultaneously and then kinda combined all of them. If I did something maybe more akin to this instead,

00:44:45.375 --> 00:44:55.750
hopefully, you guys could see. Now, k, instead of everything taking, you know, five minutes, five minutes, five minutes, and five minutes, maybe what I'm capable of doing realistically is this takes five minutes,

00:44:56.070 --> 00:45:00.230
this takes five minutes, and then the integration step about these three, which were two, three, and four,

00:45:02.115 --> 00:45:09.315
only takes two minutes. So what I'm doing is I'm basically converting a task that previously took to twenty minutes and I'm turning it into one that took twelve minutes.

00:45:09.715 --> 00:45:13.875
Which, know, if you just did a little ratio, 12 over 20 was equal to

00:45:14.210 --> 00:45:27.570
three over five. And so what I'm capable of doing is getting it down about 40%, about 60% of the total row. Hopefully, you guys see when you have tasks that can actually be broken down in this way, aka a task that you can expand and run simultaneously through some form of parallelization,

00:45:27.935 --> 00:45:30.815
Just makes more sense to do all three of these things simultaneously.

00:45:30.895 --> 00:45:44.890
Rather than one parent agent being responsible for everything, doing one, then doing two, then doing three, then doing four. What we can do is we can take two, three and four, stack them on top of each other, add an additional step five called a synthesizer and then take the results of these do it do it in like a fraction.

00:45:45.850 --> 00:46:15.350
Another big reason is that agents are what are called stochastic. Okay. They don't always return the same answer. So if I ran, you know, Claude five times on basically the exact same thing, every single time I have a slightly different response. Okay? Every time I have a slightly different response. And just to show you guys what I mean by that, I'm gonna open up my Claude code over here and I'm actually gonna open up three different tabs. Let me just visualize this, stick this right in the middle. Okay. And then over here, let me just make sure that all these are operating the same. I'm gonna say, I'd like you to determine five ways to improve this code base.

00:46:16.150 --> 00:46:28.365
I'm just gonna paste this across all three of these. I'll paste and I'll paste. Now, I'm just gonna run all three of them. And I just want you to notice sort of what's going on here. Obviously, the first thing that's gonna do is try reading the key files, but check out the different

00:46:28.685 --> 00:46:35.790
solutions basically that it's coming up with on all three of these different runs. So in the first run, k, a brokerage

00:46:36.510 --> 00:46:49.790
image paths, missing title and meta tags, nav links hidden inside mobile with no replacement. Project cards aren't actually links. No keyboard focus styles or skip to content link. The second was broken image paths, missing meta tags, no mobile nav. But now look,

00:46:50.375 --> 00:46:53.335
placeholder links everywhere, typo in footer.

00:46:53.655 --> 00:46:56.695
K. And you can see that, you know, basically, the more times we run these,

00:46:57.095 --> 00:47:06.810
you know, agents and then the further away they get from the beginning, the more they tend to diverge. And there's a statistical reason for that. Right? Like at the very beginning, this is sort of like, I don't know, the total answer.

00:47:06.970 --> 00:47:08.090
At the very beginning,

00:47:08.970 --> 00:47:10.090
you know, red,

00:47:10.250 --> 00:47:12.090
it's pretty similar to black,

00:47:12.650 --> 00:47:14.730
but eventually it diverges a fair bit.

00:47:15.050 --> 00:47:24.045
Green similar to red, and it diverges a fair bit. Blue similar to all these, but it diverges a fair bit. And I guess the point that I'm trying to make is like, you know, over here

00:47:24.525 --> 00:47:31.645
let's pick another color, so it's pretty obvious that these are all a bit different. We'll do purple. Over here, this this is sort of like the zone of similarity. Right?

00:47:32.740 --> 00:47:57.955
But then after you make it to a certain point, because of the multiple good of nature of how large language models work under the hood, they're basically multiplying statistical probabilities of, like, one token after the other after the other after the other. You have massive divergence in the end result. And so, you know, this might go a b c, this might go b c d, this might go a b e, this might go a b z, and this might go a c q or something like that.

00:47:58.275 --> 00:48:16.210
What you can do is you can actually just run five times. And now notice, if I ran this once, I'd only get a b c. But because I've ran this another time, I got all the way to d. I ran this another time, I got all the way to e. You know, if you would just count up all of the different unique answers here, I have a, I have b, I have c, I have d. I also have e. I even have q, and then I have zed.

00:48:16.715 --> 00:48:28.235
So you could see here that, like, I'm basically getting 2.5 times the total number of possible answers by running things multiple times and then just averaging out and taking all the unique outputs. Right? That's really the the principle of stochasticity.

00:48:28.235 --> 00:48:34.270
Because they don't always return the same answer, if you parallelize your agents, you can actually run multiple times with same or similar queries.

00:48:34.510 --> 00:48:39.710
And then you can actually have different answers given to you that just sort of live outside of the distribution

00:48:39.950 --> 00:48:55.865
or average run, which is pretty amazing. So I'm gonna show you guys how that works, specifically with debate and stochastic consensus models. If anybody seen my AgenTik AI course on that, you'll know more or less what I mean by that. I'm also gonna show you some fan in, out researching flows as well as some some sequential pipeline handoffs.

00:48:56.185 --> 00:49:00.185
But really, the the fourth and final reason is because model performance degrades as context increases.

00:49:00.550 --> 00:49:08.710
So the shorter and the cleaner your context windows are, typically, better the results are as well. What I mean by this is, you know, because the parallelization

00:49:08.710 --> 00:49:16.635
aspect typically involves sub agents, which I'm gonna show you guys a little bit about, you get to avoid the problem where the increasing length number of tokens leads to poor performance.

00:49:16.795 --> 00:49:28.635
And so, know, if like on average, this is more or less the relationship between the number of things in your context window and then the performance of the model, we're gonna we're gonna end up just almost always staying right around here, which is the zone of

00:49:29.115 --> 00:49:29.435
good.

00:49:29.950 --> 00:49:34.510
By the way, just made that up. It's not actually called the zone of good. Hopefully, you guys understand the distinctions there though.

00:49:34.990 --> 00:49:59.025
When you paralyze and then feed tiny chunks of a problem to multiple agents, they can all be at the zone of good. You don't actually have to like go all the way down here. It's not just one agent that's doing all the work. Okay. So so what are examples of how to parallelize in the first place? Well, there's like a built in function called agent teams now in Cloud Code, does a fair amount of this. So I'm gonna be showing you guys some ways to do that. But I just wanted to chat a little bit more generally without even going into agent teams first before I show you some demos

00:49:59.580 --> 00:50:06.620
of, like, different ways that I personally approach problem solving, and I've seen some of the best and the brightest use Cloud Code for this sort of parallelization.

00:50:07.100 --> 00:50:10.140
And I'm gonna call them common team patterns. Okay?

00:50:10.540 --> 00:50:10.940
Essentially,

00:50:11.475 --> 00:50:23.395
there are three main things I wanna cover. The first is the ability to fan out and then fan in. And so that's where you actually spawn a bunch of different research sub agents, and then you have a synthesizer sub agent, which takes all of their outputs.

00:50:23.395 --> 00:50:32.070
And then based off of the outputs of that synthesizer, you can do either more fan out, fan in flows, or you could do some form of final synthesis step.

00:50:32.230 --> 00:50:38.150
Okay. So what I mean by that is like, let's say before you have a query and it's, you know, I want to find

00:50:38.150 --> 00:50:41.590
the best. Okay. Absolute best APIs

00:50:42.635 --> 00:50:49.195
for my feature. Whatever the feature is. It's x feature. I don't know. It's like some app that generates things, whatever.

00:50:49.515 --> 00:51:00.810
So I wanna find the best APIs out there for this feature that, you know, allow me to very quickly and easily do the things that I wanna do. We can imagine like, if you were to do this in the old school linear path,

00:51:00.970 --> 00:51:06.970
what would happen is Claude code would spin up, k, in the same thread research on-site number one,

00:51:07.530 --> 00:51:08.890
and then go on-site number two,

00:51:09.535 --> 00:51:11.295
and then go site number three,

00:51:11.535 --> 00:51:17.375
and then go on-site number four. Right? And what would be occurring the entire time that we're going through all these different websites?

00:51:17.775 --> 00:51:23.695
Well, the length of our total contacts would increase, meaning our performance on average would also decrease.

00:51:24.190 --> 00:51:43.205
Okay. In addition, it's taking time. So it's five minutes here, it's five minutes there, it's five minutes there, it's five minutes there and so on and so forth. Then at the end, what it would do is it have a final synthesis step, which I'm just gonna call s, which would basically combine one, two, three and four together, which could take a certain other amount of time, maybe another five minutes before finally giving you your answer.

00:51:43.685 --> 00:51:48.405
And so the cost of the answer, okay, if you think about it as like almost like a line item,

00:51:48.645 --> 00:51:52.645
the cost of the answer is, you know, first of all, twenty five minutes,

00:51:53.170 --> 00:51:55.970
which is obviously non preferable to instant.

00:51:56.210 --> 00:51:58.450
And then, you know, a fair amount of tokens

00:51:59.330 --> 00:52:00.770
on poor

00:52:02.130 --> 00:52:03.730
quality outputs.

00:52:04.610 --> 00:52:15.235
You know, you're probably gonna end up spending a similar amount of tokens regardless, but you're spending those tokens on poor quality outputs because you're kind of you're kind of down here as opposed to up here. Right? You're you're here where you don't wanna be.

00:52:15.635 --> 00:52:23.540
Now, what fan out and fan in is is very similar to what I showed you guys earlier. You have a research query and that's, you know, find best APIs.

00:52:24.660 --> 00:52:27.380
And so what it does is Cloud Code

00:52:27.700 --> 00:52:30.820
basically goes in and then immediately spawns.

00:52:31.140 --> 00:52:32.900
K. Let's just say,

00:52:33.780 --> 00:52:35.380
four research agents.

00:52:36.115 --> 00:52:45.075
And so now we have research agent one, research agent two, research agent three, and then we have research agent four. K. And so what we're doing this year is we're we're we're fanning out.

00:52:46.035 --> 00:52:50.115
These all operate totally independently accumulating their own context windows.

00:52:50.840 --> 00:52:57.320
Because they're new agents, they're almost always in the zone of good. Maybe they'll push a little bit farther beyond that, but they're still pretty good.

00:52:57.640 --> 00:53:01.560
Once we're done with that, what we do is we do the opposite, which is the fan in,

00:53:02.280 --> 00:53:06.200
and we feed all of those into a final synthesizer agent.

00:53:06.665 --> 00:53:24.730
That synthesizer agent now is a different prompt. The prompt is not, hey, go do this research. The prompt is, hey, here's a bunch of context from a bunch of other models that have already done the research. Meaning, the prompt gets to be shorter. We then apply high level reasoning strategies and principles to make that a synthesizer as smart as possible and say things like, we want you to integrate

00:53:24.890 --> 00:53:25.770
anything

00:53:25.770 --> 00:53:26.970
that overlaps

00:53:26.970 --> 00:53:51.740
as well as any outliers and then score them slightly differently. And so, you know, rather than being all the way over here with our big thing, you know, probably we're somewhere over here in the middle, which means the performance is gonna be a little bit better. And then obviously, the synthesis step can occur in approximately the same amount of time as the actual research because you can spawn almost an infinite number of sub agents to go to research for you. And so really what happened now is you have five minutes here. You have five minutes here. You know, just add these up. It's ten minutes. And And so not only are we significantly faster,

00:53:51.820 --> 00:53:57.100
we're also a lot higher quality because now we have all the the data and information laid out to the synthesis agent.

00:53:57.340 --> 00:54:20.220
More importantly, there are different models that are better at different things. And so within Claude, you have not only your, you know, heavy lifter, which is usually the Opus models, but you also have, you know, your Sonnet models. And then although not a lot of other people use them these these days, you also have your Haiku models. And so what you can do now is for the research, which consumes a massive number of tokens, but realistically doesn't usually need like a ton of reasoning for it. It's more of like data extraction.

00:54:20.380 --> 00:54:24.860
You use something cheap like Haiku and Sana. And then for the synthesis, use something like Opus

00:54:24.940 --> 00:54:28.380
because you're applying different models at different steps.

00:54:28.620 --> 00:54:48.160
Not only is it going to going to occur much faster because Sonnet works faster than Opus. So maybe instead of five minutes here, it's actually, I don't know, three minutes. But then the cost is going to be a small proportion of the money that you normally would have spent just because the way that pricing on Claude works. Right? Pay attention here to the fact that Claude Opus, you know, in this case, 4.6 is $5.

00:54:48.240 --> 00:54:52.320
Sonnet 4.6 is 3. So we immediately save 60% right there.

00:54:52.880 --> 00:55:17.200
And that's just your base input tokens. That's not taking into account, you know, like the the the the massive difference and also output token cost and so on and so forth. And obviously, things get even better if you go down to Haiku and and so on and so forth. And so you can formalize this as a skill if you would like. K? I'm not going to. I'm just going to feed it in a simple prompt, but this will illustrate what I mean. Let's say I'm right over here in my project. K. Let me just delete this globalcloud.md

00:55:17.200 --> 00:55:18.560
because we don't need that anymore.

00:55:19.040 --> 00:55:23.360
Then I'm going to essentially let me just go back here and then copy the actual text.

00:55:24.080 --> 00:55:25.680
As I use a fan out,

00:55:26.295 --> 00:55:29.095
fan in, and researchers synthesizer

00:55:29.095 --> 00:55:29.895
approach

00:55:30.295 --> 00:55:32.135
to research the question,

00:55:33.175 --> 00:55:41.495
how best should I optimize this code base? Minimum five sub agents, use SONNET to do the research and contemplation,

00:55:43.060 --> 00:55:44.660
individual contemplation,

00:55:44.660 --> 00:55:46.020
opus to synthesize.

00:55:46.740 --> 00:55:58.685
So now what's going to occur is rather than we just waiting nonstop for all of these, what this is going to do is it'll fan out six Sonnet research agents. Each are going to investigate a slightly different optimization

00:55:58.685 --> 00:56:07.405
axis. They're all gonna focus on slightly different things, then they're gonna synthesize all of those results back together with Opus. By zoom outs, you can actually see all six of them running simultaneously.

00:56:07.850 --> 00:56:32.015
The despite the fact that we're not using this agent team feature, we're just using the, um, sub agent feature right now. Uh, you know, all of these things basically immediately are generated. Their contexts are quite short. So, I mean, in the grand scheme of things, this is a much shorter context than we would ultimately accumulate in our main agent. All of them are focused on slightly different things, are obviously autonomously managed by that orchestrator. And then finally, these six agents can finish in a linear amount of time as opposed to, you know, like multiple one.

00:56:32.255 --> 00:56:42.470
So this just finished the architecture research. It's gonna wait for the remaining five agents now. Alright. And it looks like it just finished all six research runs. So now it's going to synthesize all the findings with Opus.

00:56:42.630 --> 00:56:47.830
It's then going to also be able to take advantage of things like its planning features and so on and so forth before synthesizing.

00:56:48.585 --> 00:56:56.025
And here it is. Okay? High impact, easy fixes, gives us a big list. It's also writing the high to medium impact, easy to medium effort.

00:56:56.265 --> 00:56:58.105
And so, I mean, you know, obviously,

00:56:58.345 --> 00:57:10.290
I'm not just pulling this out of my my ass here. Anthropic has done a lot of research on the best way to solve problems. And, you know, Opus with a bunch of Sonnet sub agents massively outperforms Opus both on time, but then also quality,

00:57:10.530 --> 00:57:15.090
specifically because of, you know, Sonnet's longer context window as well as just like general usability.

00:57:15.665 --> 00:57:36.550
That's what I care about. I just care about my own u usability here. I could spend as much money as I want on these things at this point. What I care about is like, how can I extract the maximum quality with a minimum amount of time? And that's the design pattern that you wanna use. So I mean, like, use this anytime you're contemplating problems. And you don't just have to contemplate like specific API problems or development problems as well. Like, I use stuff like this anytime I'm designing, um, business systems,

00:57:36.710 --> 00:57:39.030
uh, anytime I'm designing process optimizations.

00:57:39.110 --> 00:57:50.245
I mean, I did this the other day when I was doing product differentiation, basically coming up with different ways to price and package products for a company that I now own that does this sort of thing. The opportunities here are basically limitless.

00:57:50.245 --> 00:57:53.365
You do this for competitor research. You could do this for whatever the heck you want,

00:57:53.685 --> 00:58:16.215
and I I commonly apply it. Okay? So that's fan out and fan in, where you basically spawn and researchers, usually using a cheaper, dumber model like Sonnet. And then you have a a larger synthesizer model that actually combines the results. That's how you get, you know, some of the best quality and then also the best quantity. Next, I wanna chat debate and stochastic consensus because it's kind of simpler similar, but, you know,

00:58:17.175 --> 00:58:24.935
it's also a little bit different. I use debate and stochastic consensus to basically, like, hammer out nuanced arguments and nuanced quality discussions.

00:58:25.440 --> 00:58:36.080
You know, earlier how I said we had one agent come up with a b c, another one come up with c d e, another one come up with like a b q, and so on and so forth. Well, basically, with stochastic consensus

00:58:36.160 --> 00:58:40.880
and then later debate, what we're doing is we're having different sub agents come up with different

00:58:41.525 --> 00:58:42.805
lists of solutions.

00:58:43.205 --> 00:58:47.045
And then we have something else go through, identify all of the

00:58:47.365 --> 00:58:48.005
mode,

00:58:48.805 --> 00:58:52.005
identify the mode, which is the frequency

00:58:52.005 --> 00:59:01.610
of, you know, the the number of times that a solution pops up. So let's say solution a pops up twice. K. This synthesizer agent would say, okay, there's two a's.

00:59:02.330 --> 00:59:07.930
B pops up twice, so we go two b. C pops up twice, we go two c.

00:59:08.330 --> 00:59:10.010
D pops up how many times?

00:59:11.235 --> 00:59:12.755
One, so we'd go d,

00:59:13.235 --> 00:59:14.515
then counts e,

00:59:14.515 --> 00:59:21.795
then also counts q. And so in this way, could see statistically speaking, you know, a lot of agents think these three are great solutions.

00:59:21.955 --> 00:59:33.110
One agent thought this is a good solution, Another agent thought this is a good solution. And finally, another agent thought this is a good solution. Basically, the votes of confidence here are fewer. And then what you can do is you could use this. It's almost like like

00:59:33.270 --> 00:59:42.565
a weighted average to tell you what approach to take. You know, if it's like an equation where, like, my final, I don't know, decision, which we'll just say decision,

00:59:42.565 --> 00:59:54.420
is kinda like this. It would equal two a plus two b plus two c plus d plus e plus q. And I know this is math, but don't get scared here. The the point is not to actually calculate the final solution.

00:59:54.740 --> 01:00:00.260
The reality that I'm attempting to convey to you is that because so many models came up with a,

01:00:00.500 --> 01:00:07.085
so many other models came up with e and b and q and so on and so forth, You can quickly determine consensus

01:00:07.085 --> 01:00:10.285
between a number of agents that come up with ideas.

01:00:10.445 --> 01:00:28.510
And then you can also determine which ideas are genuine outliers in so far that, you know, only one out of three models actually came up with thing. One out of twenty four four models suggested you should do x y and z thing. And so you get to farm both like the statistically most likely answers to solutions, but also like the massive outliers,

01:00:28.510 --> 01:00:29.310
which can make you quite,

01:00:30.125 --> 01:00:31.085
I wanna say,

01:00:31.325 --> 01:01:05.375
competent at solving problems in a very short period of time. And this works in a really similar way to what I talked about earlier with like the total solution space. Right? You know, if there are really a fixed number of ways to solve something, and the reality is there are a fixed number of ways to solve something, And there are also a certain number of ways not to solve something. Well, what you wanna do is you just wanna, like, cover that ground as quickly as possible. And in reality, what you could do is you could quickly spin up an agent to do all of to figure out all the ways not to do something. Okay? And then you could have, you know, one sub agent slowly figuring out, no, this doesn't work. No, this doesn't work. No, this doesn't work all simultaneously.

01:01:05.855 --> 01:01:35.805
And then what you end what you end up with is you just end up with like this beautiful field of like highly differentiated green, which tells you what you can actually do. And I understand this is more conceptual, but just bear with me here. I'll show you guys an actual example in a moment. Now, stochastic consensus is cool. It's sort of like a first go, but debate is even cooler. Because now what you do is you basically take all of these points, Okay? And then you feed them into an open, like, conversation or chat room where all other models can weigh in on solutions that might not actually be very obvious.

01:01:36.045 --> 01:01:38.845
So now, okay, if I just recreate the solution,

01:01:39.005 --> 01:01:46.610
we have agent one come up with a b c. Agent two come up with b c, I don't know, let's just say e. Agent three come up with a b q.

01:01:47.010 --> 01:01:55.010
Okay? What we do is we divide this into time steps. And so this is time one, this is time two, this is time three, and this is time four.

01:01:55.585 --> 01:02:02.145
What we do at every time step is we allow all other agents to look at all of the conversations

01:02:02.145 --> 01:02:13.900
and and all the thoughts that all the other agents have had. Okay? And what occurs as we move through is agent one gets to see agent two and agent three's responses, and so it gets to differentiate. Maybe now it goes a b c

01:02:14.140 --> 01:02:35.975
e zed because it come comes up with some additional solution by comparing its two, you know, two and three. Maybe this one comes up with b c, but then it eliminates e because it just doesn't think that makes much sense, and then it comes up with an f. You know, this one comes up with with, I don't know, two different letters, and then ends up, you know, also identifying some of the previous solutions, but then combining them in new ways and stuff like that to come up with better ones.

01:02:36.535 --> 01:02:44.460
And so what we do with the debate is it's not really a debate in the practical sense. It's not like, hey, your job is to try and convince other people why a, b, c are the best solutions.

01:02:44.860 --> 01:02:56.275
What it is is every model has access to all of the other models. And so because they have access to all of the other models and they don't have to spend all that time reasoning, they can just see the results. They can then incorporate those and come up with increasingly nuanced

01:02:56.355 --> 01:03:21.275
solutions and, you know, ultimately, spend a large search space in a very short period of time. And so we can just proceed with this all the way down. You can run as many of these, like, steps as you as you want until ultimately you have like a a list of solutions provided by a bunch of different models that are just way more complex, way more nuanced, and also just like way more interesting than the initial ones that, you know, one agent might have come up with. Alright. So I'm back on my business workspace here and

01:03:21.435 --> 01:03:29.515
we're still doing research on tomatillos, but I thought this is actually a pretty good example. Why don't we use stochastic multi agent consensus to come up with all of the different ways you can make a sauce using a tomatillo.

01:03:29.960 --> 01:03:37.560
Use stochastic multi agent consensus to determine all of the different ways that you could make a nice tasting sauce using tomatillos.

01:03:37.640 --> 01:03:46.275
I want every agent to come up with at least 10 independent responses, then have them synthesized and turned into just a giant list of all of the possible things you could do.

01:03:47.155 --> 01:03:54.675
So what the Skill Stochastic MultiAgent Consensus does, if I open it up, is basically, it breaks down a query into

01:03:54.915 --> 01:04:05.290
n other queries. That's where it says spawn n agents with the same or a slightly different prompt to independently analyze a problem, then aggregate results by consensus,

01:04:05.290 --> 01:04:11.450
which you use for decision making, ranking options, strategic analysis, or any problem where you wanna filter hallucinations,

01:04:11.905 --> 01:04:14.545
and then surface what are called high variance ideas.

01:04:14.945 --> 01:04:28.920
So anytime I use the word consensus, poll agents, stochastic consensus, spawn n agents, so on and so on and so forth, it'll go and it'll it'll do the thing. So just scrolling down here, you could see that it read through the skill and it spawned 10 agents all looking at slightly different angles here.

01:04:29.320 --> 01:04:31.560
And, you know, these are very similar prompts.

01:04:31.800 --> 01:04:35.320
Brainstorm all the different ways you can make a nice tasting sauce using tomatillos.

01:04:35.320 --> 01:04:43.245
This one's here, brainstorm all the different ways you can make a nice tasting sauce using tomatillos. This one here, brainstorm all the different ways you can make a nice tasting sauce using tomatillos.

01:04:43.245 --> 01:04:53.325
But the idea is, you know, one is a conservative tradition minded chef, the other is an adventurous boundary pushing chef, the other challenges conventional wisdom, the other reasons from first principles and so on and so on and so forth.

01:04:54.010 --> 01:05:50.035
Now because, you know, it's a pretty simple and not very intellectually difficult exercise, all 10 agents have actually already already finished. And you can see that I was able to scan a massive search space in a very short period of time, despite the fact that this problem was pretty simple. So what it's doing is similar to what I showed you earlier with those end researchers and then, um, having some sort of synthesizer model. What this is now going to do is indeed duplicate the outputs and then give me a list of pretty nuanced answers that realistically scanned most of the search space in a very short period of time. I'm sure you can imagine you could scale this up if you had, like, some sort of dedicated infrastructure, whether it's a local model or something like that. You could theoretically have stuff like this running all the time just ideating and coming up with new approaches to solve on long standing problems. This is actually the exact way that I don't know if you guys have seen, you know, they're throwing Opus now or GPT four point or GPT or other models at like these big math questions and asking them to solve them. This is exactly how they're doing them all under the hood. So as you guys could see, we pulled 10 agents. There are a 119

01:05:50.035 --> 01:05:50.915
raw ideas.

01:05:51.370 --> 01:05:52.890
Counting for duplication,

01:05:52.970 --> 01:06:57.825
there are 52 in total that are new. So what we're gonna do is we're actually gonna look at this consensus report and then ultimately, its answers. Alright. We have the consensus report opening it up here. You could see there are 52 total. The first is salsa verde crula. The next is tomatillo avocado crema. The third is aguachile verde, and so on and so on and so forth. I could work my way all the way down here, a bunch of different types. You know, I could have had one agent come up with all of these. I could. Okay? But, um, the probability that it would have been able to, one, come up with, like, a highly differentiated list like this, and two, scan as much of that search space in the same amount of time is very low. And so I'm sure you can imagine you can apply this to any business problem that you guys are currently having to just come up with a bunch of low hanging fruit solutions as well as, like, unique and and and outlier solutions as well. We even have, like, Indian influenced sauces, Persian influenced sauces, Caribbean Latin fusion sauces, and so on and so forth. An outlier that I'm definitely not trying anytime soon is Tomatillo Brablanca, which is French butter sauce using Tomatillo's pectin as a natural emulsifier. No, thank you. So what would debate look like? Debate is more or less the exact same idea. In my case, I've just turned this into a skill. It's called model dash chat.

01:06:58.065 --> 01:07:04.545
Basically, what occurs is we spawn five cloud instances in a shared conversation room where they debate, disagree, and converge on solutions.

01:07:04.865 --> 01:07:18.950
We use round robin turns with parallel execution within each round that triggers on terms like chat and so on and so forth. So I'm gonna do here is I'll say, great. This looks awesome. I'd like you to rerun this, but with model dash chat. Make sure at least 10 agents are having conversations about this. And then,

01:07:19.350 --> 01:07:31.385
you know, if any of the sauces just sound insane or terrible or crazy, then obviously have them discuss that as well. Just like our stochastic multi agent consensus took advantage of, like, time basically and traded it off against total tokens,

01:07:31.545 --> 01:07:47.300
we're doing the same thing. So what we're gonna do is we're gonna start by extracting from the user's mess from the user's message the topic or problem, the mode, the number of agents, and the number of rounds. It's then going to run an actual script that I've set up here that automates the process of like having each of the agents look at each of the other agents responses

01:07:48.020 --> 01:07:49.460
before finally doing a synthesis.

01:07:49.875 --> 01:07:59.715
Speaking of which, I just read through a couple of those. I'm actually just gonna make some time at TSS right now, so I'll be right back. Okay. So me looking at the conversation over here, just asking it to like give it to me.

01:08:00.275 --> 01:08:01.475
You actually see

01:08:01.715 --> 01:08:11.710
that all the agents are doing some thinking and the contrarian is starting with 15 ideas. It'll immediately challenge the ideas that deserve it. They're now listing their disagreements,

01:08:11.710 --> 01:08:19.555
so does this actually work? Is it a structurally sound technique or a restaurant stunt with an unacceptable failure rate? Is tamarind redundant or complementary?

01:08:19.555 --> 01:08:36.950
You know, does Tomatillo chocolate belong on the list? If so, where? Should Mole Verde be in tier one or tier two? So they're having discussions on an ongoing basis, which is always really fun to watch that we can monitor and then obviously synthesize into an answer. Okay. And then finally, we have the Tomatillo synthesis over here.

01:08:38.230 --> 01:08:40.710
Tomatillo's pectin content is underappreciated.

01:08:41.510 --> 01:08:46.550
Tomatillo husk t, unfortunately, is not cool. The foundational tier is settled and non negotiable.

01:08:46.805 --> 01:08:52.165
And I actually look at the foundational tier. You can see we actually have a bunch of different highly recommended sauces.

01:08:52.645 --> 01:09:08.040
Again, some of these are very, like, nuanced. Lacto fermented tomatillo hot sauce, taquera squeeze bottle drizzle, enchilada sauce, tomatillo aguachile, and so on and so on and so forth. And, you know, I this is just a really shitty example. But hopefully, you guys understand that you can take this to more or less anything that you want.

01:09:08.520 --> 01:09:12.360
Whether it's, you know, designing a new computer programming approach to a particular problem,

01:09:12.440 --> 01:09:21.455
whether it's choosing the right framework to approach or tackle a task with, or something else. Okay. So I just did all of the previous example using a pretty straightforward,

01:09:21.775 --> 01:09:23.215
you know, like dietary

01:09:23.215 --> 01:09:25.215
or chef sort of example.

01:09:25.455 --> 01:09:43.010
But now I wanna use this on an actual app and really just have all of these different models discussing things and doing so in a very short period of time. What I have here is I have like an algorithmic art example. And this is actually something that Claude develops. It's part of their algorithmic art base skill, which I think is actually, like, applied or supplied, I should say,

01:09:43.410 --> 01:09:43.970
in

01:09:44.290 --> 01:09:46.690
the anthropic anthropic skill directory.

01:09:47.225 --> 01:10:37.240
You can adjust some things like the the stroke weight and, like, the damping and so on and so forth, and actually have it, like, come out with very unique designs. You can then just, like, save the image and then boom. Now you have, a cool, like, wallpaper or something like that. It's kinda neat. But I wanna I wanna improve this as much as humanly possible. And the reason I'm doing it like this is because I also wanna show you guys how to apply the same approaches that I just showed you to agent teams instead, which are obviously a much more streamlined version of doing the exact same things that I've done so far. It's just streamlined in the sense that, you know, it is built out of the box to do everything, but it does so at the cost of some tokens. So I'm just gonna go back over here and then I'm just gonna look at synaptic drift dot HTML within art. I just need to make sure to, you know, remember what folder that's in. Then I'm just gonna open up another Claude instance. Now, lot of the advanced stuff as we know is actually only available in the terminal and I think agent teams are a lot better managed in the terminal. So I'm just gonna open up the terminal.

01:10:37.400 --> 01:10:45.205
I'm going to full screen it here as well. Let me delete that and then go full screen. And, you know, I could do it in here. I could also do it in like ghost TTY,

01:10:45.205 --> 01:10:52.565
which is probably my favorite like terminal to use within Claude. But for now, you know, I I have my agent team's idea.

01:10:52.805 --> 01:10:58.730
So I'm I'm basically now going to say, hey, I'd like you to optimize synaptic-drift.html

01:10:58.730 --> 01:11:00.970
and turn it into a full fledged application.

01:11:01.210 --> 01:11:06.650
However, rather than just do this all naively yourself, I want you to take advantage of stochastic multi agent consensus.

01:11:07.175 --> 01:11:11.095
I want you to take that skill and then apply it using the agent teams feature.

01:11:11.415 --> 01:11:32.420
You'll orchestrate a team of agents that do all of this stuff. Don't just use what's in the skill itself because I'd be running it a little too simply. I actually want you to to read through the whole skill and then use that to spawn agent teams. Okay. So it's gonna start by reading the skill def and then the HTML file itself, which is found in art. It's then going to go and read through the agent team's tooling and everything that it needs in order to basically spin this up easy.

01:11:33.060 --> 01:11:35.620
So it'll start by creating a team for the consensus workflow,

01:11:36.395 --> 01:11:39.355
spawning 10 analyst agents with different framings,

01:11:39.595 --> 01:11:43.355
and then finally aggregating their recommendations and implementing the winning features.

01:11:43.755 --> 01:11:58.630
So the very first thing it's going to do is spawn the analyst agents. And you could see now the UX has changed a little bit. You see down at the bottom where I have these different analysts that are running? So if I go shift down, I can actually see all of their different stochastic multi agent kind of consensus threads.

01:11:58.950 --> 01:12:08.175
So now they're all spawning and running in parallel, which is pretty neat. At any point in time, I could press enter to view sort of the conversations and and what they're doing. And I should say I I should note that,

01:12:08.655 --> 01:12:14.975
you know, the Stochastic multi agent consensus applied to agent teams is basically just the debate built in because the agents actually can can communicate.

01:12:15.215 --> 01:12:21.215
The team lead can also orchestrate that communication too. So, you know, it's not actually really independent, which is neat.

01:12:21.860 --> 01:12:25.940
You could spawn all of these in, like, different windows if you want to. You can also just continuously

01:12:25.940 --> 01:12:35.300
hold shift and then go up and down to select. What I'm doing is I'm just reading through a bunch of different threads and conversations. And it's clear that they all start by just reading through synaptic dash drift dot HTML.

01:12:36.335 --> 01:12:37.135
Finally,

01:12:37.135 --> 01:12:43.695
you know, this is now returning a bunch of agent conclusions back. And more importantly, it's also coming up with consensus,

01:12:43.695 --> 01:12:51.050
which is nice. Alright. What it's gonna do is just take all these now and close them down while also looking at the consensus,

01:12:51.210 --> 01:12:54.170
the bugs, the divergence, and then ultimately outliers.

01:12:54.810 --> 01:13:06.885
So the consensus recommendation of our next feature is high res exports, a preset system, URL state, and shareable links. The bugs are the race condition and regenerate, Download saves mid render. PG height not checked.

01:13:07.205 --> 01:13:08.085
Divergence

01:13:08.085 --> 01:13:13.605
is one or sorry. Six out of 10 agents suggest debounce red regeneration versus a live preview.

01:13:13.925 --> 01:13:22.520
Then the outliers have also come in. Mobile responsive layout, live animation mode, seed history, web worker offload, mouse attractor, repeller, and kill sidebar overlay.

01:13:22.680 --> 01:13:27.240
So this is all really cool. You could see now it's coming up. It's actually just deleting my old Tomatillo

01:13:27.240 --> 01:13:56.280
stuff. Guess we happen to be using the same file or something. Instead, it's coming up with this giant list of different conditions and features that it can build. Okay. Now it's actually shutting down all the agents, implementing it. Just because I want this to do so faster, I'll say use agent teams to do the implementation. And you can see it's actually gone through here and then added all of what we needed in order to implement the tool, the features that the model suggested. In addition, it's also spawning review agents to see if we can improve the quality of the generated code, spot problems and stuff like that.

01:13:56.920 --> 01:14:14.785
So if I go shift down, could see all those. So we have now reviewer bugs, reviewer features. Let's just see what reviewer bug says. Okay. It's now sending the review to the team lead, so it's communicating that back. Taking a look at what the reviewer is saying. Now that it's opening it up, you can see we now have a ton more features. We have different presets, ocean drift, ember storm, ink wash,

01:14:15.400 --> 01:14:17.560
neon plasma, neural fire.

01:14:17.880 --> 01:14:27.080
We have the ability to modify colors. We have one x, two x, and then four x downloads, which I don't think you guys could see because my face is in the way. But if you just look

01:14:27.080 --> 01:14:30.435
down over here, you'll see that there's significantly more functionality.

01:14:30.595 --> 01:14:45.170
Um, we can download a PNG at four x as well. We have simple, like, space bars to reload and change things. We could change the the speed and so on and so forth. Um, ultimately, this is just a better app. Right? And so we did this by basically just exchanging a couple of my dollars and tokens

01:14:45.650 --> 01:15:35.395
for, you know, a bunch of different agents, all coming up with their own ideas and then ultimately executing on them. Hopefully, you guys could see you can apply the same approach to more or less anything. There are obviously optimal token trade offs, but when you spawn the sub agents that are a little bit less capable, like SONET versus Opus, typically that math works out and you end up being able to do just as much if not more in a shorter amount of time for less money. Alright. And then finally, pipeline, which is sequential hand up between specialists. I mean, I just showed you guys a little bit of that earlier with agent teams sort of spawning review bugs and stuff like that. But basically, that's more or less it. You have task a done by some agent, which is specialized for task a. You then pass that off to agent b, which is specialized for task b, and then ultimately, agent c, which is specialized for for task c. And so, I mean, like, could just have a do all three of these things. The issue with having a do all three of these things though is one,

01:15:36.110 --> 01:15:49.870
if you guys remember earlier, good lord, this is getting a little messy. You know, we're no longer in the zone of good because odds are it has like tons of context from literally everything that it's done before. So, you know, like it would have started off over here and that would have been okay, but now it's over here and then now it's over here.

01:15:50.545 --> 01:15:52.465
And then, two, like, sometimes

01:15:54.145 --> 01:16:14.810
fast and and and good development is often at odds with like really in-depth testing, let's say. And so, if you think about it conceptually, like a a developer agent will have different incentives than like a testing agent. The developer agent will be incentivized to like build things that works really quickly using, you know, whatever is available to it. Whereas the testing agent will be incentivized

01:16:14.810 --> 01:16:24.565
to try and like spot all of the issues. And so like building things new is sort of at odds with like repairing the old things. And in that way, if you try and have one agent do everything,

01:16:24.805 --> 01:16:31.525
the probability that it will be able to do it as well as possible versus if you just spun up specialized agents that were like highly tuned for that thing,

01:16:31.925 --> 01:16:37.880
assuming their intelligences are all held equal here. I'm talking about like nonstop opus calls, not opus sauna and so on and so forth,

01:16:38.200 --> 01:16:39.959
is is is definitely different.

01:16:40.040 --> 01:16:58.085
So my recommendation would be, you know, like, what I would do is I'd have, a dev agent for a, like I just did. Then I'd have some form of, like, bug fix for b, then I'd have some sort of, like, test, maybe bug in q and a. And I'm not gonna redo that example because one, I wanna be respectful of your time, but two, I just showed you that exactly with the agent team's example.

01:16:58.725 --> 01:17:27.755
I guess the meta example here is you combine all three of these and then just have all of them interacting constantly for best results. Like, you have, you know, debate and stochastic consensus to come up with, like, the best ways to, you know, improve on a product. Then maybe you do some fan out, fan in, and researchers to go look at, like, different APIs and different design patterns that you could use to fulfill that before finally handing that off to some sort of like bug reviewer QA or tester. But hopefully, it's clear that, yeah, all of these things do not exist in isolation. They all exist together.

01:17:29.035 --> 01:17:31.595
Next, let's talk context management,

01:17:31.755 --> 01:17:39.600
which put really simply is just all of the files and folders and organizational methods that you put into a workspace

01:17:39.760 --> 01:17:51.675
to allow Claude code to effectively manage whatever work you have. Now I'm seeing a lot of people try and delegate work right now, sort of like human companies do with CEOs,

01:17:51.995 --> 01:17:53.675
you know, CTOs,

01:17:53.675 --> 01:17:54.875
CMOs,

01:17:55.595 --> 01:17:58.795
quad coder agents, and software engineers, and stuff like that.

01:17:59.560 --> 01:18:08.999
And I think initially, when I looked at this, this one's called paperclip specifically. It's got a pretty interesting repo that you could check out right over here. It's all about running your whole business with our agent team.

01:18:09.320 --> 01:18:31.470
I think initially, it's really easy to look at these and be like, hey, this is stupid. You know, I mean, that's that's what I did. I made a couple of videos and I talked ad nauseam with a couple of my friends and I was like, this is dumb. Why would we try and fit agents, which think very differently than human beings into the exact same organizational hierarchies we've been using for the last hundred fifty years? It just doesn't make sense. Human brains are different than agent brains. The latter is obviously a lot more spiky and good at certain things while sucking at others.

01:18:31.950 --> 01:18:39.230
But anyway, so as as quick as I was to initially dismiss this idea, what I've come to realize is that sub agents as these org charts

01:18:39.630 --> 01:18:51.525
and skill dot m d files, which as we know are self contained SOPs that exist within a file capitalized as Skulled in m d. These are actually just two flavors of the exact same thing.

01:18:52.805 --> 01:18:56.165
What they are is they're just different ways of organizing your markdown files.

01:18:56.860 --> 01:19:11.635
And so just like in my case, we ran a model dash chat skill earlier for me to show you guys how, you know, models debated and stuff like that. K. We had a skill.md within it that stored a bunch of information that was like hyper specific to that skill. We had model-chat.py,

01:19:11.635 --> 01:19:13.715
which was a tool that the skill could use.

01:19:14.115 --> 01:19:26.490
So too are our sub agents organized in basically the same way. I guess what I'm trying to say is like, okay. If we take sub agents on the left hand side, what was one of the main reasons why we like using sub agents? Okay. It's because it's a clear

01:19:26.810 --> 01:19:28.010
or fresh

01:19:28.250 --> 01:19:29.930
context window. Right?

01:19:30.650 --> 01:19:32.570
Alright. Awesome. So that's one.

01:19:32.890 --> 01:19:35.130
How about the fact that it's specialized?

01:19:35.530 --> 01:19:36.490
Awesome. That's another.

01:19:37.485 --> 01:19:46.285
How about the fact that the sub agent is probably more reliable at sub agent specific tasks? Right. That's another one. And then how about the fact that it's written in, you know, markdown format

01:19:46.765 --> 01:19:48.365
with tool use?

01:19:48.605 --> 01:19:52.950
Well, fantastic. That's another one. If we look at, like, how that equates to skills,

01:19:53.190 --> 01:19:53.910
honestly,

01:19:54.070 --> 01:19:59.749
the only thing that's missing is the fact that the context window is not entirely clear or fresh.

01:20:00.310 --> 01:20:02.310
But, you know, what you can do with these

01:20:02.945 --> 01:20:08.945
is because skills are so efficiently written, they're basically a form of compression that pushes you towards

01:20:09.025 --> 01:20:10.865
a shorter context window anyway.

01:20:11.425 --> 01:20:27.300
So basically, the only real difference, if I'm honest and keep in mind, you instantiate a sub agent, you're giving it, you know, a a little prompt. Right? Kind of similar to way SCO works. The only real difference between the two is just the amount of context in the sub agent versus the skill. But I want you guys to know that sub agents are honestly basically skills

01:20:27.460 --> 01:20:30.980
and skills are basically sub agents. They're just slightly different ways of storing information.

01:20:31.565 --> 01:21:07.375
So why am I bringing this up? Um, just because I'm coming to realize that the two are the two are very similar and they're soon, I'm sure, the future going to be, like, merged even more so into a similar concept. Um, all these two point at are just different ways of organizing your context and basically organizing the way that you you get tasks done. One delegates via CEO to CTO, CMO, CTO, all all the stuff. Right? I don't know why there's two CTOs now that I'm looking at that. It's kinda weird. Whereas the other one stores things in a skulled entity. Like, just going back to anti gravity right over here. Right? Like, I could go to this skills folder and then I could go and find that model dash chat. And I mean, like, way that this is written is basically the exact same,

01:21:08.015 --> 01:21:22.530
you know, schema, basically, that a sub agent is written in. If I go over here to Claude Co's actual documentation page on sub agents, I mean, you you actually have basically the exact same structure. See how here it says the title code reviewer description prompt tools model.

01:21:22.945 --> 01:21:36.305
K. You see over here, what do have? We have the name. K. We have a description, and we also have the tools. I mean, like, the model is sort of baked in here because it's in our main thread. It's gonna be OPUS 4.6. But hopefully, you guys are saying, like, skills and subheadings actually are really similar. They're just slightly different ways of organizing information.

01:21:36.940 --> 01:21:59.205
So I'm making this big point because I think that's important to realize. As we continue moving forward with Claude code and other tools and we get more and more advanced with them, the shapes of how we're transmitting information to our models will likely end up being quite quite similar. Whether one person decides to use a paperclip style, big fleet of agents that does x y z, which maybe, you know, just a couple of months ago, I might have looked at, scoffed, and said, like, well, that doesn't do anything.

01:21:59.925 --> 01:22:38.550
You know, so too are skills basically the the same thing. So the model intelligence is growing more and more and more capable within the harness, which is what allows the the development of these really interesting organizational hierarchies. So what are some of these organizational hierarchies? We've already shown you paper clip here. And the way that paper clip works or it's rather it's supposed to work is this is like a dashboard, which somebody develops that, you you know, think just preys off of maybe preys isn't the right word, but it uses people's misunderstandings of how agents work. It equates them an anthropomorphosis that makes them seem really similar to humans, and then it puts us in front of you so that you feel like you're running a whole team. And so in this way, clearly, it's broken down by role. Right? Whereas the average skill is not broken down by role, the average skill is broken down by function.

01:22:39.030 --> 01:23:06.780
Also, skills typically don't delegate to other skills. That's really the main difference. But paperclip isn't the only one that's like this. Here's another good example, company helm. This one over here is a very same similar sort of idea, where you basically have an AI studio. Within the AI studio, you define a a bunch of different roles for your agents and so on and so forth, and then that's ultimately what allows you to manage your projects. This instead of being left to right is obviously, you know, organized a little bit differently. The front end builder, a QA runner and and so on and so forth. How about OpenGOAT,

01:23:06.780 --> 01:23:10.380
which is the AI autonomous organization of OpenClaw agents?

01:23:10.540 --> 01:23:28.865
Again, know, it's doing this with like CEO, head of sales, customer support based organization, which I don't really believe is ideal. I don't really think you should have this level of direct reports. I mean, like, think about it, why? All of these could just be Opus 4.6, they could be way smarter, they could pull from some sort of shared context pool, and I think you really wouldn't leave that much out. But it is an interesting approach.

01:23:29.390 --> 01:23:31.310
This one over here is called the system,

01:23:31.310 --> 01:23:39.150
which is obviously using some sort of AI generated diagram here. But it's 26 specialized agents, which we thought about that do architecture,

01:23:39.150 --> 01:23:40.749
design, product development,

01:23:40.990 --> 01:23:45.795
release, operations, and so on and so forth. This one over here, think is called Gastown,

01:23:45.795 --> 01:23:46.675
which is basically

01:23:46.995 --> 01:24:00.540
where you have a mayor, which is your AI coordinator, a bunch of different crew members, and then also poll cats or worker agents. You guys may have heard of Crew AI. It's the same sort of idea. It's a fast and flexible multi agent framework, which supposedly delegates things.

01:24:00.860 --> 01:24:02.620
K. Where you have crews

01:24:02.620 --> 01:24:19.455
that have different agents within them, each with their own segregated tool calling and stuff like that. And, you know, it's another way of organizing information. This one over here, Swarmclaw is CEO based, developer, researcher, and again, you have delegation. So all these are different attempts by different groups of people to try and determine,

01:24:19.695 --> 01:24:24.975
the best organizational hierarchy of agents. And I think pretty much all of them suck right now, to be clear.

01:24:25.615 --> 01:24:30.790
But I just want you guys to know and level with me that these are just different ways of organizing information.

01:24:30.870 --> 01:24:38.950
Just like you have skills and skills are highly, you know, specific to you. It's just a collection of markdown files with names, descriptionals, allowed tools, and then like SOPs.

01:24:39.345 --> 01:24:49.825
Subages are basically the exact same thing. So as the field continues to mature and there are better and more novel context management strategies out there, multi agent orchestrators essentially,

01:24:50.065 --> 01:24:51.905
you know, these things will grow

01:24:51.810 --> 01:24:55.970
differentiated. Now in terms of what I would consider to be actually valuable delegation,

01:24:56.370 --> 01:25:02.770
k, there are two main design patterns. The first is the parent researcher and QA system,

01:25:02.930 --> 01:25:07.490
where essentially you have a parent model, is usually a smart one. So this would probably be like your Opus model

01:25:07.795 --> 01:25:10.195
That communicates with researchers,

01:25:10.275 --> 01:25:15.234
plural. This will be dumber models like Sona that typically do research better and more economically.

01:25:15.315 --> 01:25:38.475
And then some QA agents like Opus, which are basically just tuned to QA and nothing else. And the idea here is this is a good balance between like those super bloated org charts that we saw earlier, while still allowing each type of agent to do the things that it is inherently better than human beings at. The parent agent is obviously orchestrator. Anything that is up at the top, can always consider to be an orchestrator. Now, what you have here is you have multiple, you know, Sonnet researchers.

01:25:38.715 --> 01:25:47.595
And this takes advantage of that fan out idea. K. Where when Opus needs something, it doesn't just do the research itself because that'll pollute its context window. It goes, does a bunch of research,

01:25:48.180 --> 01:25:55.700
fits in quadrillions of tokens into the context windows of these Sonnet agents, then takes summaries of that and then uses that to make decisions.

01:25:55.860 --> 01:26:11.475
And then basically, the way that it works is, and I'm just gonna sort of draw like the the logic flow. Opus will decide to do something. It delegates down here. K. That information comes back to Opus. Opus then build something kind of on its own. After it's done building something, it goes and gives the

01:26:12.035 --> 01:26:20.600
product of its building over to the q and a agent. The q and a agent returns some changes that it suggests it makes. Opus then goes through, makes those changes.

01:26:20.920 --> 01:26:40.055
Again, gives it to the QA agent. QA agent returns. This loop continues until basically everything is done. If there's research that's necessary, it'll go down, do some research here and then continue develop. And then finally, you have your whatever the the final product is that you're building, whether it's like a business system, a development system, or whatever. In this way, you're maximizing the incentives of each individual agent while also allowing,

01:26:40.695 --> 01:26:51.490
I wanna say, like, the leanest possible setup that still recognizes that different things are better at different types of agents are better at different types of tasks. You know, we could make this bigger, of course. We could have like a testing agent. We could have

01:26:51.810 --> 01:27:27.410
a design agent. We could have a development agent. We could have a back end agent. But, know, the more complicated you get with the stuff, again, as mentioned, like typically the worse that it gets. If you wanna go even leaner than that, then the second system is developer and QA, where you literally just have a smart parent. K? And then you have a smart q and a, and then you just go back and forth between the two. And what happens is every time that you wanna test something, you sort of have like a claud at dot m d or or or just like a prompt that's baked into your parent that legitimately says, hey, after you've done every development, run it through a new QA agent. The idea here is the QA has like literally no prompt other than, you know, you're a QA agent with no context,

01:27:27.650 --> 01:27:28.690
read this code,

01:27:29.175 --> 01:27:32.935
and apply the following whatever, like design principles to it.

01:27:33.575 --> 01:28:08.485
And basically, occurs is this QA agent, since it doesn't know what the heck the project is is on, it's not going to be biased like the parent agent will be in the development of the feature. The parent agent will have feedback from the QA agent and so it'll be able to incorporate into its own thread and take advantage of all of the preexisting list of failures and successes and things it's tried and so on and so forth. But the QA agent is like new and it's new spawned every time. And so typically, what'll work what the way it'll work is the parent agent will go and it'll develop a feature. And then at the end of the development, there'll be something in the Cloud NMD or system prompt that says, okay, now that you're done, make sure to check it with the QA agent. So we'll spawn a QA agent. The QA agent will then give feedback.

01:28:09.150 --> 01:28:18.749
K. The parent will design. Feedback, the parent will design. Feedback, the parent will design. No feedback because it's now good. Parent's done. And so now we have the final product.

01:28:19.390 --> 01:28:33.715
Obviously, you know, because it has to do its own research and stuff like that, I personally think this is not as ideal, but it is even simpler. And, um, keep in mind that there is always, a time cost every time you spin up a sub agent. It's a fixed time cost, but, uh, there's also some compound probabilities you're multiplying because,

01:28:33.875 --> 01:28:45.940
you know, you are having an agent delegate something to another agent. Basically, there's no human in the loop. The more independent steps that an agent has to do without a human being in the loop, the higher the probability that it will diverge from its sort of intended

01:28:46.420 --> 01:28:54.015
goal or intended task. So when your parent agent in the previous example generates, you know, a bunch of research queries to the, you know,

01:28:54.495 --> 01:29:26.405
Sonnet sub agents and goes and does them. There's no guarantee that the research of the Sonnet sub agents are doing is actually a 100% faithful to what your initial query was. Every step along the chain that is further from you, typically, the results and the quality is a little bit more diluted. So I mean, like, it'd be it'd be either one of these for me, developer q and a or some sort of parent researcher q and a. That'd basically be it though. Um, personally, I find right now with all the org charts and stuff like that, we're just we're just going a little bit too much. We definitely don't need, uh, I don't know, 700 layers of CEOs and customer success agents and lead engineer agents and stuff like that.

01:29:27.365 --> 01:29:35.285
Now, wanna talk about something that's gotten a lot of attention recently and does genuinely have the potential to significantly improve many business and programming functions.

01:29:35.445 --> 01:29:42.540
It's called auto research. Essentially, what I have in front of me is I have a research lab that I've spun up to improve the load speed of one of my websites.

01:29:42.940 --> 01:29:49.645
Now, the way that you gauge whether or not a website is loading quickly is based off of three main metrics.

01:29:49.805 --> 01:29:54.765
The first is called LCP, least contentful paint. FCP, first contentful paint.

01:29:55.085 --> 01:29:59.245
Then there's TBT, I don't know what that stands for. And then finally, there's performance score.

01:29:59.485 --> 01:30:15.740
And so this is a standardized assessment called the Google Lighthouse score that you've probably seen before. And basically, it measures like, you know, when I type in a one second copy and I press the enter button, how fast does literally everything on the page load? It also checks for very minor things like, you know, when I when I load this website,

01:30:16.585 --> 01:30:20.505
does the content on the page shift around? So my website here, leftclick.ai,

01:30:20.505 --> 01:30:34.740
is just one of many ones that I own. And essentially, it's just a little bit too slow right now. And it's slow for a variety of reasons. We got this cool, like, glass isomorphism animation on the page. You know, there's, like, stuff moving around and lots of images of my team and and so on and so forth. So,

01:30:35.300 --> 01:30:38.260
you know, what I've decided to do is I've decided to basically

01:30:38.260 --> 01:31:02.140
take all of the load off of me to make this website faster, and then just give it all to that fleet of agents to do so instead. Auto research is basically perfect for use cases just like this, where we have a very defined goal, in my case, to decrease or increase a couple of metrics, a very defined change method, which is how you actually make the impact. So in my case, just modifying the website code, Then a very standardized assessment, which in my case is that lighthouse score.

01:31:02.540 --> 01:31:19.255
In case you have never seen this before, basically, Andrey Karpathy, who is the one of the founding members of OpenAI, and then he also was the head of AI at Tesla for quite a while. You know, he he just was doing a bunch of research on his own for one of the models that he was running, and he's just like, you know, don't I have to do this stuff anymore?

01:31:19.735 --> 01:31:26.055
I feel like I'm at the point where I could have AI actually run most of my research for me. Let me make a a quick hypothesis.

01:31:26.470 --> 01:31:39.270
If I just gave all of my changes to AI, would it be able to do the same thing that I do while I slept, such that when I wake up, I'll have like a big list of improvements? And that turns out, you know, he he can. And it's not that AI agents are like better than human beings at determining these research changes,

01:31:39.875 --> 01:31:42.915
but it's actually quite standardized to to do conceptually.

01:31:43.075 --> 01:31:58.290
You're basically just like looking over a bunch of different possible things you could do, making one tiny change, and then just evaluating, hey, did that actually improve my score? Did that make things better? If so, I keep it, and I just move on to the next thing. I go over and over and over and over and over again until finally, you know, you you make it hundreds of iterations later.

01:31:58.610 --> 01:32:13.455
So, you know, in my case, like, we I just reran the test because I wanna start this from scratch to show you guys how this works. Well, it's actually fairly straightforward. And what I'll do next is I'll run you guys through the original way that auto research works, and then how to download the repo, and then set it up on your end for whatever the use cases that you you particularly have.

01:32:13.775 --> 01:32:21.855
So it all started when Andre Karpathy, who was a researcher, he used to work at Tesla. Think he was the head of AI at Tesla, and then he was also one of founding members of OpenAI,

01:32:22.015 --> 01:32:23.135
asked himself, you know,

01:32:24.310 --> 01:32:31.430
all this work that I'm doing, all this research stuff that I'm doing, is there any way to automate it? And he found that if he just broke down step by step what it is that he actually had to do,

01:32:31.910 --> 01:32:33.830
it more or less always went like this.

01:32:34.150 --> 01:32:36.230
You know, he he just had a little loop setup

01:32:36.595 --> 01:32:40.515
where, you know, he would make a hypothesis.

01:32:40.835 --> 01:32:45.395
And the hypothesis would be like, hey, if I change x, y, and z, I think my system will run faster.

01:32:45.955 --> 01:32:46.595
Then

01:32:46.915 --> 01:32:52.035
he'd actually execute the change. So he'd actually go and he'd adjust x, y, z. Then finally, he'd assess.

01:32:52.760 --> 01:33:03.960
And then if the assessment was good, aka it made an improvement, then he would just go back to this and then make another one. Then if the assessment was bad, aka it failed, then he would just get rid of it and then not change anything and then, you know, kinda start from scratch.

01:33:04.765 --> 01:33:09.085
And all along the way, k, what he would do is he would update this little document,

01:33:09.645 --> 01:33:12.925
which you and I could just call like a research log.

01:33:13.245 --> 01:33:31.110
And, you know, basically, the first change would be like, oh, you know, this worked. It was great. Second change, oh, no. It didn't work. Then here's why. Third change, okay. It worked. That was great. And eventually, over time, you end up with this massive, massive log of all the different possible things you could do to an AI to whatever your task is, and all the things that you have tried in the past that doesn't really change anything.

01:33:31.845 --> 01:33:36.005
Okay. So this is made of three files. There's a prepare dot py, which in our case is kinda pointless.

01:33:36.085 --> 01:33:41.765
And there's a train dot py and then a program dot n d. The reason why the prepare dot py is pointless is because it's just about like AI

01:33:41.925 --> 01:33:53.140
research specifically. It's like fixed constants, downloading the training data, training a a BP, byte paracoding, tokenizer, and a bunch of other stuff that just isn't really relevant. The stuff for us though is obviously we wanna train this and and improve this

01:33:53.780 --> 01:33:56.900
improve our programs. We wanna improve our websites. We wanna improve some of our business functions.

01:33:57.515 --> 01:34:10.260
These two files here train dot py and program dot py basically underscore how the entire thing works. Okay. So the super important one here is called program dot m d. What you do is you basically just tell it what you want it to do. So for instance, hey,

01:34:11.060 --> 01:34:28.425
here's what you can do as an AI agent. Modify this file. K? Every time you do, I want you to print a summary of the scores and then log it to this file. And that's literally it. It just goes through that loop over and over and over and over and over again. Then the actual train dot py, in this case, is just like the website itself. Sorry. The the AI model

01:34:29.385 --> 01:34:38.025
setup itself with all the layers and stuff like that. In our case, right, the example that I was just showing you a moment ago, that's just my website, basically. And so basically, it just like it has a loop setup in its prompt.

01:34:38.630 --> 01:34:46.230
You tell it what you can change or what you can't change. You give it some, like, sort of log file that it dumps everything to, so you have, like, a big list of changes in progress.

01:34:46.870 --> 01:34:58.395
And then, you know, after that, you are you're basically done, honestly. You just fire it off and let it go. And when you do, you know, you can make some pretty cool changes. So, you know, I just reran the thing, and we're already seeing some pretty substantial improvements.

01:34:59.035 --> 01:35:05.755
Not all these improvements are the same ones I was showing you guys before. It's this research lab just I'm just resetting it over and over and over again to see if I could find anything more interesting.

01:35:06.390 --> 01:35:13.350
Okay. So hopefully, that's pretty straightforward. Simplest and easiest way to do that is just head over to github.com/carpathy/autoresearch.

01:35:13.350 --> 01:35:20.870
And then what you do is you just copy this link. Okay. So how do we actually do this? Just open up anti gravity. I'll click open folder. I'll just make a new one called auto research test.

01:35:21.225 --> 01:35:22.745
K. And then I'm gonna open.

01:35:23.705 --> 01:35:31.305
And I'm going to click on Claude code. Zoom weigh in so you guys could see. And actually just paste this and say clone this into our current folder

01:35:31.625 --> 01:35:34.905
auto research test. Just so that it doesn't do this in my

01:35:35.280 --> 01:35:37.760
kind of my root folder, which it's done a couple times.

01:35:38.560 --> 01:35:43.040
Alright. So it's gonna start saying, hey, I want you to clone this. So it's gonna give it a a quick try.

01:35:43.840 --> 01:35:53.405
It's just gonna dump all the files in here. So now we basically have the exact same thing we had before. Right? We have the program dot py, prepare dot py, train dot py, the progress, and, you know, even like a read me that explains everything.

01:35:53.885 --> 01:36:07.970
So now all we need to do if we wanna like, I don't know, train this on a site or something is well, first of all, why don't we just make a quick site? Hey, build me a simple one page portfolio site for Nick's Arrive. And obviously, it doesn't know what my name is. So it's now going to build a simple one page portfolio site.

01:36:08.450 --> 01:36:11.890
I just wanted to do it here, so it's going to do this inside of this file.

01:36:12.210 --> 01:36:13.650
First, it's gonna ask me some questions,

01:36:15.115 --> 01:36:24.555
Just add demo information for everything. And my goal is I just wanna build a brief little website here for us, and then I just wanna run auto research on it, show you guys how easy it is to optimize things.

01:36:24.955 --> 01:36:38.330
In our case, we're gonna do website. There are a million different things you could apply auto research to. I'm gonna go through a quick and easy framework, but first, I'm just gonna show you guys what you need in order to actually set this up. Alright. Now what I'm gonna say is, excellent. I'd like you to create a dashboard

01:36:38.875 --> 01:36:43.195
for auto research and then set up the auto research framework

01:36:43.355 --> 01:36:54.075
to optimize the Google Lighthouse page score for index dot HTML. I want you to run this on a local loop and basically just make index dot HTML as fast as possible across LCP,

01:36:54.660 --> 01:36:55.780
FCP,

01:36:55.940 --> 01:36:56.900
TBT,

01:36:56.900 --> 01:37:02.980
and then also performance score. Then give me some sort of live dashboard view so I could watch it. I'm actually working in reality.

01:37:04.820 --> 01:37:18.135
Cool. And then I'm just gonna press enter. And basically, what it's gonna do is it's gonna read through all these files right over here. And then it's going to use all of the information here in order to set up the dashboard for me. And while it's working, I just wanted to explain a little bit about where we are and where we're going.

01:37:18.455 --> 01:37:32.660
The initial stage of AI encoding was quarter like vibe coding. This is like 2024, 2025 stuff where a human being, okay, us, prompts. Then an AI writes some code, and then a human being reviews. So in this way, our roles were basically relegated to writing.

01:37:32.980 --> 01:37:38.225
We would write the prompts. We would make minor changes where necessary, and in that way, like build a website or something.

01:37:38.545 --> 01:37:56.380
Well, nowadays, most of us do agenda engineering, and this is sort of what the advanced part of our course deals with. So this is where instead of just dealing with one AI, we're actually orchestrating agents. And these agents are doing multiple things for us all the time, and then basically, like, returning the results so that we could see and then, like, assess and make slight little recommended changes.

01:37:56.460 --> 01:37:58.620
So in this way, our role is more of a director.

01:37:58.860 --> 01:38:05.215
But auto research represents sort of the the next jump from agentic engineering to actually full independent research.

01:38:05.455 --> 01:38:09.215
Where now all we do is we're no longer, like, actually even directing the agents.

01:38:09.375 --> 01:38:17.935
We we let them handle their own direction. What we do is we just say, hey, I have a goal and I'd like you to achieve this goal. Here's how you can modify x y and z, and here's an assessment.

01:38:18.460 --> 01:38:23.420
And so in this way, we set the direction. The agent just runs completely autonomously. And then what we are is we're basically like a

01:38:23.820 --> 01:38:26.220
we're like a we're like a principal

01:38:26.220 --> 01:38:36.355
investigator, like a researcher at a lab somewhere. We just say, hey, you know, I want you to do x y z, and then we just go farm it out to a bunch of, you know, research assistant RA monkeys to go and do the experiments and so on and so forth for us.

01:38:36.835 --> 01:38:58.590
And so this is along a spectrum of decreasing human involvement. And I'm not really sure what comes next after independent research, but I do not imagine it will require human pings in the loop essentially at all. This is the same sort of thing that big research labs right now are currently using to optimize their setup. So Anthropix almost certainly doing this all day long for Cloud Code to make things faster, to make things more performant.

01:38:58.830 --> 01:39:29.680
Um, you know, OpenAI is probably doing this behind the scenes to make a codex, not only better, but even, like, adjust the architecture of the AI models and so on and so forth. They're probably doing it across all their web properties. Right? Anyone that's really worth any salt at this point has probably been doing something like what I'm showing you guys with auto research for at least a little while. It's just auto research is, uh, Carpathi's way to democratize that and then allow people, you know, to to do this even with, paid providers like, uh, Anthropix Cloud. K. So if I go back here, you can see this is actually set up the auto research loop and it's actually doing the research, um, which is not essentially what I wanted to do. I wanted to actually see the dashboard.

01:39:30.000 --> 01:39:32.080
So it'll say is, show me the dashboard

01:39:32.745 --> 01:39:35.305
because I actually wanna, like, watch it work live.

01:39:35.785 --> 01:39:39.624
And then it's just paused the optimization loop. Now it's going to show me set dashboard.

01:39:40.105 --> 01:39:47.190
It's restarted that, and then, um, I guess it's going to actually show it to me now in a second. Cool. We have it right here.

01:39:47.670 --> 01:39:57.990
Awesome. So here is our dashboard and we are running multiple experiments. Obviously, this looks a little bit different from the dashboard I showed you guys earlier from my left click auto reshoot, but that's okay. I don't want this to look the same. I wanna show you guys that you can apply this to whatever you

01:39:58.825 --> 01:40:02.825
Our very first experiment had an f c p of four six four seven five two

01:40:03.065 --> 01:40:04.665
and a size of 12.9.

01:40:04.665 --> 01:40:21.090
What we ended up doing is we ended up minifying the CSS, making a bunch of changes to the code basically, and it took it from 12.9 down to 10, which technically makes our website even faster. But in reality, it doesn't actually influence things because our scores are basically the same, at least speed wise. K. So this is just gonna continue operating.

01:40:21.810 --> 01:40:22.690
Just say continue.

01:40:23.565 --> 01:40:34.765
Now in my case, what this is doing is it's currently occupying the main thread. Right? So this is why it's gonna be writing and making changes and stuff like that. At any point in time, I could say, hey, just go run this in the background. Or, hey, just want you to run this in a loop using, like, the Anthropic

01:40:35.630 --> 01:41:04.620
agent SDK or something like that. I'd supply my API key and then it would and then it would go. And what it's doing now is it's actually making the changes. I guess, I should probably also like open the website itself. That'd probably make more sense. Let me actually take a look at what that looks like. Right. So here's here's the actual website itself. And you can see that, like, for the most part, you know, it's very basic and simple. But what we're doing is we're just optimizing it. We're making it faster and faster and faster. This may break the website in some cases. Sometimes some minor changes like this do. But as you can see here, we've actually, like, improved it by a whole whopping two milliseconds.

01:41:04.700 --> 01:41:22.105
Right? We made whatever change we did that made this a little bit slower has now been fixed and we're a little bit faster, then it's just keeping each of these. So, you know, these things will go down very, very, very slightly. They'll increase very, very slightly. But, you know, if you let it go for enough loops, then eventually, can get to the point where you're legitimately making pretty large improvements to the least contentful paint,

01:41:22.585 --> 01:41:30.345
you know, first contentful paint and and so on and so on and so forth. And just know that we can discard any runs that don't actually do anything. So, you know, in my case, my

01:41:31.290 --> 01:41:47.815
uh, like, the one requirement I had for my left click perf auto research, uh, run was that you can't visually change the website at all. So you should take a screenshot and it should be pixel perfect compared to the initial one, which is why it's, not adjusting the font or whatever. But I can make more or less any other change aside from that, and it's it is doing so, which is pretty neat.

01:41:48.215 --> 01:42:01.010
Okay. So now you're probably wondering, Nick, so how the hell do I actually use auto research for my own business aside from the demo that I just showed you? And like, what else could I apply it to? And my rule for auto research is that in order for you to meaningfully make any changes, you need to have three things.

01:42:01.650 --> 01:42:12.555
The first is you need to have a metric that you want to optimize for. So in my example, what is the metric that I am optimizing for? Well, I'm off obviously optimizing for my lighthouse score.

01:42:12.875 --> 01:42:25.035
And so it's a very standardized metric. It's really simple and it's very objective. There's no real negotiations about what a lighthouse score is. Google invented it. It is what it is. That's what I'm looking basically to to to assess.

01:42:25.870 --> 01:42:35.630
The second thing that you need is you need a way to change that metric. So you need a way you can influence an outcome that modifies the metric itself.

01:42:35.870 --> 01:42:37.390
So if you think about it in terms of Lighthouse

01:42:37.795 --> 01:42:38.675
page score,

01:42:39.075 --> 01:42:51.795
the direct way to modify your Lighthouse score is just to change your website. And the direct way to do that is just like alter the code a little bit. So in my case, not only do I have the metric, which is a Lighthouse score, I have a direct way I can immediately change the metric.

01:42:52.500 --> 01:42:59.380
And then the third thing that you need on top of that is not only do you need a metric, and then you need a way to change the metric, you also need a way to assess

01:42:59.540 --> 01:43:01.380
what it is that you just did.

01:43:01.700 --> 01:43:03.060
And so because

01:43:03.380 --> 01:43:24.960
it's kind of like in the name, right, this is sort of a contrived example. But like the Lighthouse score has a Lighthouse test, and the Lighthouse test just tells you what your Lighthouse score is. So I have, like, the thing I'm trying to improve, which is, you know, all the metrics I just showed you guys. I have a way to improve it, which is modifying the website, and then I have way to assess that, which is my Lighthouse page score, which I can run-in a loop basically immediately after the changes. It takes me just a few seconds.

01:43:25.280 --> 01:43:30.960
And so those are the three things that you need. If I were to formalize this, k, and I will because I just want everybody to know

01:43:31.360 --> 01:43:35.665
and and be able to visualize it. The three things you need in order to do auto research,

01:43:35.905 --> 01:43:37.505
k, are

01:43:37.505 --> 01:43:38.625
number one,

01:43:39.665 --> 01:43:40.705
a metric.

01:43:41.265 --> 01:43:43.425
Number two, a way to influence

01:43:43.745 --> 01:43:44.385
or the,

01:43:45.040 --> 01:43:51.600
I don't know, change method, let's call it, which allows you to influence the metric. And then three, some sort of assessment.

01:43:52.160 --> 01:44:07.315
And with the change method and the assessment, the most important thing, at least in in my view, is that you can do both of these things pretty fast. Like, if your change method takes a really long time to do, it takes like an hour or whatever, and then your assessment takes another hour. If you think about it, your your experiment will only be able to run as fast as basically once every two hours.

01:44:07.795 --> 01:44:16.550
And that's still like light years ahead of like a, you know, a human experimenter. But if you really wanna see like those crazy vertical lines in the graph as things just get better and better and better, sort of recursive self improvement,

01:44:16.870 --> 01:44:24.310
you know, you need to have a pretty short change method. So ideally, this would take, I don't know, let's say, like, thirty seconds or so. Why am I drawing like that? I could just do this.

01:44:24.950 --> 01:44:32.805
You know, maybe like thirty seconds or And ideally, the assessment would also take maybe thirty seconds or so as well. Because combined, what we have here is we have a loop that can run 60 times per hour.

01:44:33.125 --> 01:44:35.685
Or if you multiply that out, what's 24

01:44:35.685 --> 01:44:36.805
times 60?

01:44:37.285 --> 01:44:38.725
A lot. 1,440

01:44:38.725 --> 01:44:41.925
times a day. I mean, like, if you could run an experiment 1,044

01:44:41.925 --> 01:44:42.725
times a day,

01:44:43.490 --> 01:44:45.410
you know, even if, like, only

01:44:45.650 --> 01:44:52.930
2% of these are actually good, that's like, I don't know, about 30 changes that improve. And if every change improves things by 1%,

01:44:53.010 --> 01:45:01.015
what you've just done, to be clear, is you've gone 1.1 raised to 30 1.01 raised to 30, which is a 34%

01:45:01.015 --> 01:45:02.535
improvement per day,

01:45:03.335 --> 01:45:13.730
at least in the first day. If you had, I don't know, let's say 90 of these changes be good, then this math ends up mapping way better for you. It's 2.4 x. You had a 180

01:45:13.730 --> 01:45:31.025
of these changes, you'd be six x and so on and so forth. This is gonna go basically as high as you let it. And so going back to my anti gravity here, just seeing a couple of the changes. It looks like the biggest change that it has made that is actually and actively improved things, was this jump between forty five and six twenty seven.

01:45:31.425 --> 01:45:42.145
So it made some change here. Content visibility auto removes scroll behavior smooth that actually significantly improved the the load speed. And so that's what it did here. And we gone from six forty six at the top to

01:45:42.350 --> 01:45:45.790
a fast contentful pane here of at the lowest six nineteen.

01:45:46.030 --> 01:46:08.785
It looks like the least contentful pane did not change at all. Meaning, if this currently loads in, like, I think six hundred milliseconds or so, it's pretty dang good. Now, kind of a contrived example since I just had AI build me the simplest website ever. But, you know, you could see with a more complex website, one that I built for the most part, at least initially, and then one that AI didn't really have a lot of time to optimize for it, and it was a lot more complex as animations and stuff. We've actually improved that improved that by 20%.

01:46:09.260 --> 01:46:15.340
To give you guys some more context, there are some people out there that have applied this to projects that have improved metrics by, like, 50%.

01:46:15.420 --> 01:46:17.180
So Toby Litke pointed

01:46:17.820 --> 01:46:20.220
this autonomous AI research

01:46:20.620 --> 01:46:21.580
system

01:46:22.025 --> 01:46:42.580
over at by the way, this is the founder of Shopify. Right? Big guy or CEO of Shopify, I should say. He ran auto research on the entire Shopify liquid code base. Now that's responsible for, like, running more or less everything about Shopify. Like, it's it's their templating liquid syntax language thing. It's it's a lot of freaking code. And he found that after running this for however many times, he had 53%

01:46:42.580 --> 01:46:49.780
faster combined parse plus render time, which is his main metric. 61 fewer 61% fewer object allocations, another metric.

01:46:50.260 --> 01:47:10.270
And things are just freaking printing for him. I mean, the you know, what's that like? Twice as fast, essentially? To think that you could just point this at something and go twice as fast in, 20 I don't know, like, 30 runs or something like that is nuts to think about. I don't know how long this took. Maybe it was like an evening. Maybe he went to bed, woke up the next morning, and his freaking whole code library was twice as fast. I don't I don't know.

01:47:10.910 --> 01:47:17.870
But I mean, like, the fact that he he has done this and he can do this is obviously very impressive to anybody that has any sort of software that they wanna optimize.

01:47:18.430 --> 01:47:20.190
So what are, like, the practical takeaways?

01:47:20.350 --> 01:47:33.835
You can optimize basically anything you want. So in my case, optimizing website. How about you guys make a SaaS app? Well, you can actually optimize SaaS app. You can optimize not only the front end of the SaaS app, you could optimize the back end. You could say, hey. Hey. Here's your server. Here's the whole setup.

01:47:34.075 --> 01:48:08.770
I want you to make this load as fast as possible. I want, like, the request to come in instantly. Do whatever the heck it takes to do it. Here's a quick little test method. You know, we we time how long it takes for one request to come in when you click a button. You could just tell it that. Even if you just gave it literally the exact transcript that I just gave you a moment ago, it would probably do a pretty good job so long as you're the auto research framework. You could optimize random tiny things in your business. I mean, there are probably some, like, interfaces, random little modules, and stuff like that in your company that, like, you know, could be way faster and way better. You can actually optimize that. You could optimize things like customer support queries. You could, like, uh, I don't know, have, like, a prompt, let's say, that, like, an AI agent uses in order to handle handle customer support.

01:48:09.090 --> 01:48:58.915
And maybe you're running some big enterprise, or maybe you're plugged into a big enterprise and you have the ability to collect this data. We could actually just, like, test modifying the prompt and then, like, waiting, I don't know, like an hour and then seeing the changes. And, you know, it's an hour, which is kind kind of a loop, but it's still 24 changes a day. You could like meaningfully modify that and move that in the direction towards your goal. You could do cold email. That's personally what I'm using this for. Cold email is kind of a special case because again, you need a fair amount more time, but I'm still capable of doing something like six to 10 tests a day at like over 500 to a thousand emails per test, which is pretty dang good. You could optimize a bunch of other things as well. You could optimize like your ad creative. You could optimize your copy. You could optimize your conversion rate by making minor changes to a page. Could really have agents optimize whatever the heck you want as long as you have the volume of data necessary in order to, like, construct the test. So hopefully, I made it really clear how all this stuff works. All you really have to do is just head over to, you know, that carpenter auto research

01:48:59.235 --> 01:49:01.795
that carpenter auto research. Sorry. Not that one.

01:49:03.310 --> 01:49:05.230
Library or repo over here.

01:49:05.550 --> 01:49:25.175
K. And then just copy that puppy in, clone it inside of your repo, and then just do away on whatever task you have. The simplest and easiest one for you guys to see how things work are obviously the website ones. But, yeah, just know that like you can apply this to more or less anything. As long as you have those three points that I've mentioned, you need a metric to optimize, you need a change method or a way to influence that metric, and then ultimately you need an assessment.

01:49:26.455 --> 01:49:28.695
Next, I'd like to talk about automation,

01:49:28.855 --> 01:49:35.540
specifically automating things on the Internet. We're gonna start with HTTP requests, then we're gonna move up to browser automation.

01:49:35.700 --> 01:49:42.500
And then finally, we're gonna round it off with computer automation. And I'll talk about a bunch of different platforms you could use and ways to do more or less all of these things.

01:49:43.215 --> 01:49:44.415
So HTTP

01:49:44.415 --> 01:49:49.055
requests are probably the simplest and easiest form of, you know, Internet automation.

01:49:49.215 --> 01:49:50.735
And Cloud Code does this natively.

01:49:50.895 --> 01:49:55.055
In case you guys didn't know, HTTP stands for hypertext transfer protocol.

01:49:55.540 --> 01:50:02.260
And essentially, every time I send a request to a website, basically, every time I try and load one, what I'm doing is I'm sending a HTTP

01:50:02.260 --> 01:50:03.459
get request

01:50:04.020 --> 01:50:16.315
to the server upon which my website is located. And then my browser will take the response and then mark it up and make it look all pretty. So for instance, let's just like rerun that one more time. My browser, the client,

01:50:16.555 --> 01:50:28.380
decides it wants to access left click .ai on account of I just typed it into my freaking page. The second I press enter, what we're doing is we're actually sending a request over to their server, k, which is located at some IP address.

01:50:28.620 --> 01:50:37.980
And that server is configured to automatically respond to requests of that kind by just dumping the whole website and giving it to you. And so then my browser takes that whole website and then it like marks it up and now I could see it. Right?

01:50:38.635 --> 01:50:47.195
Now you might be wondering what exactly is it marking up? Well, if you view the source of the website, which is pretty easy to do. You can go to any website, just right click, press view page source, and you'll see all the HTML.

01:50:47.675 --> 01:51:01.770
You can see that what a website is actually sending and receiving is not like the pretty images and stuff like that. It's it's usually just sending references to those images. And this is actually the content of the website. My browser just has mechanisms inside of it that just know how to turn this into that.

01:51:02.505 --> 01:51:13.225
Okay. So case in point, um, the definitive AI growth partner for fast moving b two b companies. This didn't just, like, come out of nowhere. It's not like this is, like, an image. This is actual text on a page. Right? If I go the definitive,

01:51:13.305 --> 01:51:23.320
you could see that it's actually being represented on the kind of code of the page that is being sent from the server every time I make an HTTP get request. The definitive AI growth partner for fast moving b to b companies.

01:51:23.560 --> 01:51:26.760
Alright. So why is this relevant to us? Well, because the first aspect

01:51:27.000 --> 01:51:33.175
of any sort of browser automation, doing things on the Internet, I should say, not browser automation, but, like, automating network tasks,

01:51:33.815 --> 01:51:35.895
is this hypertext transfer protocol.

01:51:36.135 --> 01:51:40.215
Claude and other AI models now have the ability to use web tools

01:51:40.295 --> 01:51:50.370
to basically make HTTP requests of the kind that I just showed you. And that allows it to do a tremendous number of things, Not all things, but a tremendous number of things if you know how to use it right.

01:51:50.690 --> 01:52:04.455
So the simplest and easiest way for me to demonstrate that is you can actually just like scrape any website you want now with Claude or any other agent. Hopefully, it's pretty clear and obvious how. What we do is we just take the URL. We go back to our agent, which in my case is this auto research one. Then I'm just going to say,

01:52:05.495 --> 01:52:06.374
retrieve

01:52:06.375 --> 01:52:08.055
contents of this,

01:52:08.695 --> 01:52:09.815
just the text.

01:52:10.775 --> 01:52:17.670
What What this is going to do next is this obviously going to send the HTTP request using the web fetch tool over to HTTPS leftclick.ai.

01:52:18.230 --> 01:52:31.175
And now, what it what will have gotten back, k, is it will have gotten back exactly what I just showed you a moment ago, k, which is all of this. And because I said just the text, if I go back here, you could see that it is extracted,

01:52:31.175 --> 01:52:34.055
sort of stripped all of the code here,

01:52:34.295 --> 01:52:47.160
and it's returning basically just to the stuff that it could actually see. So what did it say? Navigation case studies about services reviews. Let's talk. Case studies about services reviews. Let's talk. The definitive AI growth founder, fast moving b to b companies. Say it right over here.

01:52:47.480 --> 01:52:52.440
You know, worked with Anthropic, Notion, Wix, Hagen, V, Lighttricks, Durable, and so on and so on and so forth. Right?

01:52:53.155 --> 01:52:56.355
So I guess what I'm trying to say is like, this is a simple way that I can get data.

01:52:56.675 --> 01:53:12.680
And so one of the first and most elementary uses of, you know, any sort of coding agent is just you can automate website scraping really easily. So I could give it a simple list of tasks and I could say, hey, I want you to scrape like 400 different websites. I could literally just give it a big array top to bottom. It it could go and it could do the scraping.

01:53:13.160 --> 01:53:30.225
Now the issue is a lot of the time, k, you wanna go further than just scraping, than just reading a website. You wanna do is you actually wanna dynamically interact with website and change things. So for instance, let's say, what I'm doing is I'm getting a big list of all of the agencies out there, the AI agencies like LeftClick, and I wanna send them all messages.

01:53:30.625 --> 01:53:30.865
Well,

01:53:31.530 --> 01:53:44.490
you know, I could just scrape every single website to see if there's an email address. Right? But in my case, maybe there's no email address. So what do I wanna do? I wanna take that next step. The way that I do so is usually through some sort of form or whatever. How do I automate the clicking of a specific button?

01:53:45.065 --> 01:53:51.145
It's kind of difficult to do. Right? I can't just automate the clicking of a specific button through an HTTP request because,

01:53:51.305 --> 01:53:55.065
you know, this is something more than HTTP. It's kind of JavaScript.

01:53:55.465 --> 01:54:00.320
I could try. In some websites, I'll be able to. So hacking this, Hey, extract

01:54:00.800 --> 01:54:02.160
the cal.com

01:54:02.160 --> 01:54:03.440
link for me

01:54:03.760 --> 01:54:05.040
and then open

01:54:06.000 --> 01:54:07.040
in Chrome.

01:54:07.120 --> 01:54:31.720
Now going one step further. Okay. We're gonna open this link in Chrome. So we actually have this link available. And there are some services out there where you can actually just send an HTTP request to actually, like, book a meeting on a page. But you might think, in order to do that, make sure you have to click on this button and then type this in and then enter a bunch of information and so on and so forth. Turns out I can actually just use HTTP request. So I'm just gonna say, book a meeting for 03:30PM tomorrow. First name, test, last name, test email, nick@test.com.

01:54:31.720 --> 01:54:34.440
And without any more information, what it's gonna do

01:54:34.760 --> 01:54:37.000
is it's gonna go and it's gonna find the API documentation.

01:54:37.925 --> 01:54:44.485
So I'm gonna check the availability using the API documentation, and then finally, it's going to ask to book. So I'm gonna say 03:30PM, March 30,

01:54:45.045 --> 01:54:59.810
then it's going to go and actually do the booking. But you notice how many issues there are and errors there are with us? This obviously isn't perfect. Now I could theoretically figure out the exact schema and format that I need to use in order to send requests like this every single time that I try and book like a cal.com.

01:54:59.970 --> 01:55:02.370
But the reality is, like, not everybody's gonna have a cal.com.

01:55:02.450 --> 01:55:07.665
What I'm doing here is I'm building a very particular solution that solves my one particular problem, the HTTP request.

01:55:07.745 --> 01:55:17.585
And even then, you know, there's just gonna be some back and forth. It's not gonna be it's not gonna be perfect. And this is taking forever. I mean, I've been sitting here for, ten, fifteen minutes. It's trying its best. It's booking with a variety of different

01:55:18.065 --> 01:55:23.510
means and I don't know. Who knows? Maybe it'll actually go and do the booking. Okay. There we go. We actually did end up doing the booking. Thank goodness.

01:55:23.910 --> 01:55:39.855
That said, that took forever and was obviously a very fragile solution that only works with, like, particular cal.com pages. Right? And so that's where we move to the next level of automation. That's where we go from simple HTTP request, which, you know, most services out there will have some sort of API application programming interface that you can actually communicate with.

01:55:40.175 --> 01:55:46.015
But, you know, they're super fragile. They require very particular formats. And as you could see, they they could take a really long time, and then they're very narrow.

01:55:46.415 --> 01:55:53.480
That's where we move from sort of like the first level of automation, made should be request, all the way to full scale browser automation,

01:55:53.560 --> 01:55:56.200
which is where, uh, Cloud actually fully controls your browser.

01:55:56.440 --> 01:56:06.345
And, you know, there there are a couple of built in tools with this now, but typically, the best way to do this is using one of two tools. Lisa's at the time of this recording, um, Chrome Dev Tools

01:56:07.065 --> 01:56:08.265
MCP,

01:56:08.905 --> 01:56:10.745
or there's also the browser

01:56:11.865 --> 01:56:13.065
use platform,

01:56:13.145 --> 01:56:28.140
which actually is pretty new, pretty recent, but it, uh, costs a fair amount of money. And so what this does is instead of just sending HTTP requests under the hood, what this does is it actually loads up a whole browser for you and then goes through the process of doing a booking. So you see how hard it was for me to do this, you know, sort of simple

01:56:28.220 --> 01:56:49.410
task of, like, booking a meeting on a calendar even though I gave it the exact time, the exact information, and so on and so forth. That might have taken a human being one second. It took me, like, something like five minutes of back and forth and probably, like, $40 of tokens. So meanwhile, I can open up a page that has Chrome DevTools MCP, and I could basically say, go here, book a thirty minute meeting for, uh, I don't know, March 30 at 3PM.

01:56:49.970 --> 01:56:53.090
Nick test, nick@test.com,

01:56:53.410 --> 01:56:56.930
answer a bunch of demo stuff for any booking queues.

01:56:57.330 --> 01:57:09.215
Can I I I just want you to look at what's going on? I was just using Chrome somewhere else, so it's just gonna kill the preexisting instance. But now it's actually gonna open up a new one. I want you to notice that, like, this is actually, like, opening up a freaking instance on my browser.

01:57:09.535 --> 01:57:26.300
And then it's scrolling through and it's clicking on buttons and navigating on the navigating through the page for me. It's literally doing this by modifying the JavaScript of the page and running brief little commands in order to, like, communicate and go through things. So it's filling up the phone number, what made you wanna contact Nick's team, what's the project budget,

01:57:26.795 --> 01:57:40.395
Do you share anything that'll help us prepare and so on and so forth? I think the project budget in this case might not actually be 5 or 10 I I don't even think that's an option because we don't go that cheap. As you can see here, it's finding the options for the budget, selecting 25 to 50 k, and then it actually goes through and it it does

01:57:40.940 --> 01:57:43.420
So what are we learning from this experience?

01:57:43.900 --> 01:57:51.580
This is much more general. K? It works way better for a much wider variety of use cases, but it's also a lot slower.

01:57:52.060 --> 01:58:09.480
Right? This is something that previously, could have just sent one HTTP request once I know the format, and then I would have, like, booked up for, like, point two seconds. Right? But now, you know, we're kinda going through the page one step at a time. Every single one of these actions realistically is kind of like a almost like the same amount of time that a single HTTP request would take. Now what it's doing is actually deleting,

01:58:09.720 --> 01:58:25.185
you know, my numbers and trying to reformulate numbers and stuff like that in order to, like, make it a valid phone number. And, you know, after a little bit of finagling, it it actually ended up finishing it, which is nice. So it actually went through. It confirmed it. It then went through the booking process and so on and so forth. And it actually took screenshots the whole way through of the process.

01:58:25.585 --> 01:58:28.225
So why am I showing you this now? Because

01:58:29.025 --> 01:58:29.905
basically,

01:58:29.905 --> 01:58:31.345
this is a gradient

01:58:31.585 --> 01:58:34.705
where it takes more setup time

01:58:35.170 --> 01:58:39.170
to do browser or any sort of automation via HTTP request,

01:58:39.490 --> 01:58:40.690
but it's faster

01:58:41.890 --> 01:58:43.330
and usually cheaper.

01:58:44.370 --> 01:58:50.415
And there's a spectrum where we go from more setup time, faster and cheaper, to basically always works,

01:58:50.495 --> 01:58:53.055
but more expensive and slower,

01:58:53.935 --> 01:58:55.374
assuming that you you don't.

01:58:55.695 --> 01:59:10.860
And so what does that mean? That means for any sort of, like, prototyping business application on a browser. I typically use browser automation or even computer automation, which I'll talk about. And then once I've sorted out that it works, I'll actually go and I'll see, hey. Can we do this via an HTTP request? Because if so, it'll be way cheaper, then we can just run a bunch of HTTP requests in the background.

01:59:11.340 --> 01:59:25.195
And it's important because, like, most of the time, like, the cool stuff that you can do with cloud is actually just, like, automation. Right? So understanding sort of this trade off between pure HTTP requests, which typically function off of, you know, hidden APIs or whatever. And then browser automation, full computer automation,

01:59:25.275 --> 01:59:35.630
will let you be able to control a lot of things much better. So it's just one example of browser automation. I could I could use browser automation for anything. Hey. I'm considering renting in Vancouver, BC, looking for $3,000

01:59:35.630 --> 01:59:50.124
a month max one bedroom rental somewhere in the downtown core. Are in buildings that have cool amenities like pools and stuff, and then the bottom two are sort of like our budget options. I could stick that puppy in there, and then it'll actually go through and, you know, navigate to some rentals.c a page.

01:59:50.685 --> 02:00:03.390
I couldn't do this via HTTP requests without spending a lot of time sorting all this stuff out. Even then, it would be very fragile because the way that these websites work is they actually, like, explicitly try and go anti automation. They make it, like, really, really difficult to do anything.

02:00:03.870 --> 02:00:10.190
But, um, you know, in this case, what can I do? I can actually just open it up. I can change a couple of filters, I can actually go and, like, zoom in on the page.

02:00:10.510 --> 02:00:32.260
It it can do whatever the heck. It can use the stuff on the right hand side. It could it could use stuff in the middle. It can thumb through things. It can get me like a big list of apartments and so on and so forth. And I mean, like, the trade off here is this is gonna take a fair amount of time. Right? Like, as you see, it's like one action every five seconds or so. But it's so general that I could just give it a task and we'll go and do it. You know, if I were to try and do this by saying, hey, go scrapetherentals.c

02:00:32.260 --> 02:00:34.580
a web page or whatever, that that would take

02:00:34.900 --> 02:00:48.345
so much time in order to build to the point where it doesn't just error out. And then most websites are also very anti brow anti HTTP request automation because it's the simplest and easiest one. You end up just getting like error, error, error, error. This actually, like, uses my browser, which is kinda neat. Right?

02:00:48.985 --> 02:00:56.585
Anyway, I'm just gonna let all this stuff go. And in the meantime, talk a little bit about browser use, which I think is probably like the the next level up.

02:00:58.080 --> 02:01:08.640
Just called browser use the way the AI uses the Internet. I don't know how long this is gonna end up being sort of like the the way to go. But basically, this is like the next level up from Chrome DevTools MCP,

02:01:08.960 --> 02:01:19.715
where you give it some very simple instructions and stuff like that, like fill up my loan application, and it'll actually go through the form using something very similar to what we did. Maybe uses Chrome m c Chrome DevTools MCP under that. I don't know.

02:01:20.275 --> 02:01:25.955
Um, and you do it for, you know, like a bulk one time payment of a $100 plus, like, pay as you go via credits.

02:01:26.510 --> 02:01:32.270
So in my case, I'm not, like, affiliated with this company at all, to be clear. So I'm not gonna touch on it too much, but obviously, it's a pretty cool product.

02:01:32.830 --> 02:01:48.195
The big draw, I would say, for most people here is just like HTTP requests can be blocked because of, you know, platforms and stuff like that just being scraped all the times. They try and stop you. Um, so too can Chrome DevTools MCP be blocked in, like, any sort of, like, instance browsers.

02:01:48.435 --> 02:01:49.395
This platform

02:01:49.635 --> 02:01:55.715
like, basically, the whole point, you know, just to kinda cut to the, you know, the pricing page and all that stuff. Like like, 99.9%

02:01:55.715 --> 02:01:58.730
of the reason you would wanna use this because it is completely undetectable.

02:01:58.970 --> 02:02:00.810
Um, you could make HTTP requests,

02:02:02.410 --> 02:02:11.690
sort of the old school way, and then try proxies and stuff, and maybe that'll work, and maybe it also won't. But if you go Chrome DevTools MCP and that doesn't work, this is what you do, and it's basically, like, 99.9%

02:02:11.690 --> 02:02:12.010
perfect.

02:02:12.515 --> 02:02:17.155
It does this because it fingerprints, aka, like, gives every one of your browser instances

02:02:17.395 --> 02:02:36.970
that are controlled by AI, like this hyper custom sort of profile. So it seems like it's, like, a request that's made from a real person. And then in that way, it, like, just, like, obfuscates it all. So for most purposes, like, I still use Chrome DevTools MCP, and this is, my main pick. But if I have anything that, like, I need to do in sort of a sneaky way and, uh, when I say sneaky way here, I mean, like, this is great for stuff like social media.

02:02:37.370 --> 02:02:48.275
So if you wanna do, like, Facebook scraping or Instagram scraping or if you actually wanna, like, interact with and leave posts and comments and stuff, pretty tough to do just right out the box sort of with, like, a a virgin Chrome DevTools MCP.

02:02:48.355 --> 02:02:52.115
But this is really, really good at, like, posting, sending DMs,

02:02:52.195 --> 02:02:52.915
x

02:02:53.000 --> 02:02:55.400
connect requests, what whatever the heck you wanna do.

02:02:55.960 --> 02:03:19.415
So, yeah, not affiliated with that company at all, but it is pretty sweet. And I think that that's they're probably gonna remain the market leader in there. But anyway, so just like HTTP requests had a lot of setup time, but they were faster and cheaper once you set them up. Browser automation is kinda like a good, like, middle ground where it's like, oh, you know, like, this actually has some some basic browser functionality built in and, like, it's pretty obvious how to, like, click a button or whatever. Computer automation is sort of like on the far end of the spectrum

02:03:20.100 --> 02:03:33.940
where basically no matter what you throw at it, it will always work. The downside is it's very expensive, takes a tremendous number of tokens at least right now, and it's very, very slow. And the way it does this is, you know, whereas HTTP requests manipulate like APIs

02:03:34.355 --> 02:03:40.355
and curl requests. Curl is actually lower case. Browser automation manipulates JavaScript

02:03:40.515 --> 02:03:41.075
and,

02:03:41.395 --> 02:03:44.195
I don't know, like page clicks, like button clicks.

02:03:44.595 --> 02:03:47.075
Computer automation literally controls your mouse

02:03:47.730 --> 02:03:49.170
and your keyboard.

02:03:50.130 --> 02:04:16.230
And because it controls your mouse and your keyboard, you can do more or less whatever the heck you want. Like, I could literally like, I could take my mouse, and then I could go all the way up here, and then I could close that tab. Can move this all the way at the left. It could close that tab. Like, basically, it it can do anything on the computer that I can do. Now the way you do this right now is you gotta use the Claude desktop app. So I'm gonna head over to Claude, and then I'm gonna open that up. And then I think it's currently available in both co word co work and code, but I'll just move over to the co work tab. And I'll say, have computer use,

02:04:17.270 --> 02:04:19.350
scan through my downloads,

02:04:20.230 --> 02:04:21.430
find the

02:04:23.190 --> 02:04:26.149
image called maker school 26

02:04:26.390 --> 02:04:27.350
or something,

02:04:28.550 --> 02:04:30.230
and then

02:04:29.865 --> 02:04:31.625
rename it to weekly

02:04:31.625 --> 02:04:32.425
community

02:04:33.065 --> 02:04:34.665
call picture.

02:04:36.025 --> 02:04:55.550
And the reason why I'm doing this is because every dang week, I have a weekly community call, and then I always just lose where the images that I use as the thumbnail. And what it's gonna do to start is it's actually gonna whip up like computer use. So it's gonna request access to my finder. And now, as you could see here, it's actually whipped up like a computer use thing. So now it's gonna go through and actually like type in my downloads folder or whatever.

02:04:56.675 --> 02:05:07.955
Navigate over there, and it's just gonna start typing a bunch of different things like maker school and maker school 26 and probably try multiple variations of like maker school, maker school underscore, and so on and so forth.

02:05:08.690 --> 02:05:12.850
Because it's using my mouse and my keyboard, you know, I can actually, like, scroll through and and do things.

02:05:13.330 --> 02:05:20.610
Now this is, local browser automation. It's actually literally exactly what I want, which is nice. I could have done this in, thirty seconds, but it's nice that it's figuring this out.

02:05:21.465 --> 02:05:25.705
It's using, like, a local browser sorry, local automation here

02:05:25.865 --> 02:05:32.825
to, like, click through, scroll down, and stuff like that. If at any point in time I wanna change it, I'll say, no. You had it. It's the

02:05:34.040 --> 02:05:36.359
the cover 26.

02:05:38.200 --> 02:05:54.645
I'll press that in just so that it knows what it's doing. Alright. Just went to grab a coffee and I got back and it has now found the Maker School icon 26, renamed it to exactly what I wanted. And, yeah, I guess I screwed up on the name, that that was what I wanted, which is pretty cool. So hopefully, you guys could see pretty straightforward here to use computer automation.

02:05:54.885 --> 02:06:07.760
Takes a lot longer. Also, consumes a lot more tokens because it is literally like controlling my mouse as it moves across the page, taking screenshots of everything as it does so, and the amount of like fidelity that it requires in order to do that is is pretty high.

02:06:08.080 --> 02:06:12.720
But yeah. I mean, like, eventually, okay, put on a loop. This sort of thing will work.

02:06:13.200 --> 02:06:23.675
It it might just take a tremendous amount of time. Just give it a task. Say, keep going until you solve it, and it will do it. It will just probably burn your a hole through your wallet while while it does so.

02:06:24.315 --> 02:06:37.890
Realistically, the probably core play that I repeatedly fall on as somebody that designs these systems for real businesses that earn hundreds of thousands to millions of dollars a month, is I will start with some form of browser automation for the most part since we're usually just doing this in browser.

02:06:38.610 --> 02:06:51.405
I'll usually try Chrome DevTools MCP first. If that doesn't work because it's like a stealth application or it's something that, you know, requires social media access, I'll do browser use. Once I have that flow down, you know, unless it's like a Facebook or something like that,

02:06:51.725 --> 02:07:29.385
because, uh, those are just notoriously difficult to, like, HTTP automate as well. Um, assuming that it's not, what I'll do is I'll look to have Cloud Code build like custom utility based off of the data that it gets from Chrome DevTools MCP because it'll have access to network requests and actually see the requests that are being sent and received. Once we have all that, then I now have, like, the API internally. I write a bunch of docs and have Cloud Code sort of, like, embed that within my workspace. And then the next time around, I can just use HTTP requests. Although, you know, keep in mind that when you do it this way, simply because of the volume that you're able to hit and the fact that HTTP is, like, typically a lot more regulated than browser automation, you know, there are some there are some risks to that as well. You could get rate limited. You could get throttle. You could also get shadowbanned.

02:07:29.800 --> 02:07:50.955
Okay. So that's the three levels of automating economically valuable knowledge work through Claude. It's really just HTTP request, browser automation, or computer automation. Whatever way you decide, just know that doing that sort of automation is against the terms of services of a lot of platforms that you work with. So I'm not condoning this. I can't really explicitly recommend it. Just making sure that you guys understand sort of what's available and what other people are doing as well.

02:07:52.315 --> 02:07:56.715
Next up, I wanna talk about Claude code performance fluctuations

02:07:56.010 --> 02:07:58.890
and what to do if and when this ends up happening.

02:07:59.610 --> 02:08:02.810
I don't know if you guys have ever watched that movie Interstellar,

02:08:02.810 --> 02:08:05.930
the one with Matthew McConaughey. It's one of my favorite movies ever.

02:08:06.170 --> 02:08:15.124
And in it, there is a major problem that has plagued the world that has, you know, sort of settled the events of the movie in motion. And that's basically this idea

02:08:15.205 --> 02:08:16.645
of the blight.

02:08:17.445 --> 02:08:23.285
Now, what the blight is, is it's some disease that started affecting a bunch of plants.

02:08:23.880 --> 02:08:34.840
And as a result, something like ninety percent of all of the food in the world is now just corn, specific type of corn. That's why they got these big cornfields and stuff, and then, you know, the main character's family just does corn farming all day.

02:08:35.320 --> 02:08:40.834
So in history, this idea is referred to as monoculture

02:08:41.075 --> 02:09:02.690
harvesting, like monoco monoculture farming, essentially. And it's where, you know, one particular crop is just so damn good. It's just so freaking productive. Right? Has the highest yields and so on and so forth. Then over the generations, the farmers learn, well, this is the best crop ever. Why don't I just replace all my crops with this crop? Then I can make a bunch of crops, and then I'll just trade this crop for other crops as necessary.

02:09:04.105 --> 02:09:11.544
Every time that happens, usually, productivity or yields will go up, and they'll go up for sometimes a long period of time, sometimes like literally generations.

02:09:11.945 --> 02:09:22.600
And then all of a sudden what occurs is there ends up being a problem with that crop. The problem is either in the soil, the problem is maybe a bug that is developed that, like, really screws with that crop specifically or something else.

02:09:23.000 --> 02:09:32.885
And because all of the farmer's eggs were in that one basket with that one crop, what ends up happening is this this blight or this disease or this circumstance ends up destroying all of their crops at once.

02:09:33.285 --> 02:09:54.480
That's led to some of the biggest famines throughout history, I believe. And it's one of the reasons why, you know, farmers nowadays do a bunch of things, namely crop rotation. They have multiple different crops that occupy the same thing of land. They, you know, we usually don't do just one crop. They have multiple crops going, whatever types of crops they are. Just so that if a harvest, you know, one type fails, then, you know, they'll at least get something from something else.

02:09:55.120 --> 02:09:58.400
Well, the reason why I'm bringing up this analogy, and I think I've really hammered at home here,

02:09:59.155 --> 02:10:02.195
is because I think this applies to Cloud Code.

02:10:02.595 --> 02:10:26.300
Cloud Code's really good. I don't think there's a better coding harness out there. I don't think there really is anything better than Cloud Code, at least since the time it's recording, and I don't know if there ever will be. This is me just being honest with you guys. I think at a certain point with AI, you know, an agent's ability to program the next model, k, just gets better and better and better. And so the people that have the better agents, if if they apply their resources effectively, just end up with, like, this impossible advantage due to exponential growth.

02:10:27.005 --> 02:10:36.845
So what that logically means is that, you know, it's the best crop ever. Right? It gives you the biggest yields ever. Because it's so productive and because it makes you productive, you're probably just gonna wanna use it all the time.

02:10:37.165 --> 02:10:47.190
The downside to that is there are a lot of things here outside of our control in terms of Cloud Code performance. And sometimes, Cloud Code performance goes up and it goes down, and other times, it's just completely gone.

02:10:47.510 --> 02:11:02.485
So the reality is we're probably all gonna be using Cloud Code a lot because Cloud Code, as mentioned, is freaking awesome. But if you grow to rely on it to the point where Cloud Code is basically a monoculture crop, you end up with situations like this, which actually just happened yesterday.

02:11:02.485 --> 02:11:04.165
Just one of many occurrences.

02:11:04.565 --> 02:11:08.900
To make a long story short, Cloud went down. You know? There was a big issue with Opus 4.6,

02:11:08.980 --> 02:11:14.900
and I think it lasted, like, maybe an hour or so. And basically, 95% of developer productivity

02:11:15.140 --> 02:11:23.905
plummeted the second that Cloud was gone. The reason why is because, you know, Claude was everything. They stored all their files on, you know, the Claude desktop app with simple

02:11:24.545 --> 02:11:29.025
skills that were just made in like Claude's format and nobody or nothing else's.

02:11:29.425 --> 02:11:30.625
The second that Claude,

02:11:31.105 --> 02:11:52.715
you know, was down, then all their prompts that they had saved in specific points and stuff like that were very difficult to access, and they weren't good to use with other models. Whole code bases that have been designed by Claude were not interpretable at all. There was no commentings. They tried using other models and other agents, and, like, that didn't really work. And then ultimately, Claude is just the best. The intelligences of the these other agents just don't work the same. So, you know, just led to, a bunch of bunch of issues, essentially.

02:11:53.195 --> 02:12:17.150
This isn't the first time that this has happened. This has actually happened a number of times. You know, this is Adam from earlier today talking about, like, major outages with Claude and how different types of platforms are operational, whereas other ones aren't. There's also a bunch of Claude code performance degradations. You know, I just looked up an old post from I think it was Derek here, who's one of the lead guys on Claude code. He like drops Claude code updates and stuff all the time.

02:12:17.965 --> 02:12:19.485
Well, anyway, you know,

02:12:19.965 --> 02:12:23.325
there were degradations historically. This is 12/17/2025

02:12:23.325 --> 02:12:30.685
of Opus 4.5 in Claude Code, where basically, because of some runaway, either garbage collection or some sort of, like, memory issue,

02:12:31.440 --> 02:12:37.920
You know, Opus just got worse and worse and worse and worse every day for a certain period of time, which led to, like, you know, massive performance decreases.

02:12:38.560 --> 02:12:41.440
Literally, on planet Earth, at least in knowledge work.

02:12:41.840 --> 02:12:54.505
So okay. Hopefully, at least this point, I've convinced you guys why Claude is nowadays probably already pretty monocultry and likely as it continues to to dominate and likely to just become more and more and more monocultry over time.

02:12:55.225 --> 02:13:02.470
The question obviously is what the hell can we do about it? And so there are a couple of solutions, and most of them revolve around this idea of diversification.

02:13:02.710 --> 02:13:06.150
We're basically, you know, instead of just putting all of your eggs in the

02:13:06.470 --> 02:13:08.870
clawed basket, this is my cute little basket,

02:13:09.190 --> 02:13:11.590
sticking it chock full of, you know, nice clawed eggs.

02:13:12.175 --> 02:13:18.415
What we do is instead of putting on all 10 of our productivity eggs in this clawed basket, we put like seven,

02:13:18.415 --> 02:13:23.295
eight, or maybe nine in them. Okay? So maybe like seven out of 10 in clawed.

02:13:23.935 --> 02:13:30.680
And then what you do with your other three out of 10 is you just distribute them. You distribute them such that, you know, I don't know, one out of the 10

02:13:31.080 --> 02:13:32.680
are in codex.

02:13:33.480 --> 02:13:35.320
You know, another one out of 10

02:13:35.720 --> 02:13:41.355
my god. I'm gonna get really good at drawing these. Are in I don't know, like, anti gravities like Gemini. Right?

02:13:41.995 --> 02:13:42.955
And maybe

02:13:43.355 --> 02:13:44.475
one out of 10

02:13:45.035 --> 02:13:47.755
are in some other type of coding harness,

02:13:48.395 --> 02:14:39.435
like a pie or something that maybe also uses, like, some form of, like, local models or whatever. The point that I'm making is, obviously, we're being pragmatic here. Like, you should probably predominantly use the best model out there because, you know, it's not like a it's not a linear thing. If a model is, like, 1% better than another model, that 1%, once you get smart enough, is like the difference like a gulf. Right? Einstein is like 1% smarter than a a a normal human being or something like that, and he was able to come up with a theory of relativity or something along those lines. Obviously, don't take me at face value there. I'm sure his IQ is through the roof. But the point that I'm making is, like, when you get to this point with these weird galactic intelligence, even like a small little increase in the the the intelligence of the model may lead to, a big downsize difference. Right? So if you have the ability to use the best model, just use the best model. But don't put all your eggs in that basket because if that occurs, then what'll basically happen is, like, as the performance

02:14:39.595 --> 02:14:40.795
of Claude

02:14:41.355 --> 02:14:44.795
over time goes up, assuming Claude is orange.

02:14:46.075 --> 02:14:47.595
Your total productivity

02:14:47.890 --> 02:14:51.730
in blue here will also go up basically in lockstep.

02:14:51.810 --> 02:14:58.210
And so if the performance of Claude goes down, so too is your entire productivity. If the performance of Claude goes up, so too does your entire productivity.

02:14:58.530 --> 02:15:07.915
Instead, diversify. Okay? Instead of just this, like, yellow one, which is Claude, maybe you have like a green one here, which is Codex. And what occurs is, you know, Codex maybe is a little bit more like this.

02:15:08.395 --> 02:15:11.035
And so what ends up happening is the average performances

02:15:11.035 --> 02:15:27.065
of, you know, both of these sort of average out. And then instead of being super reliant on Claude, what you get to do is, know, this black thing, which is like you, ends up being a lot more stable. It's the same thing in investing. Have you guys ever invested in, like, I don't know, an ETF or some sort of

02:15:28.105 --> 02:15:35.065
index fund? You know, basically, the way that all of these stocks work is there'll be a stock that does this. There'll be another stock that does that.

02:15:35.865 --> 02:15:37.624
There'll be another stock that does this.

02:15:37.945 --> 02:15:51.010
There'll be another stock that does this. Do you see how volatile okay. That stock probably doesn't go back. Do you see how volatile all these different stocks are? Well, rather than tie your your literal life savings to all of, you know, any one of these stocks, you just tie them to all of them simultaneously.

02:15:51.090 --> 02:15:55.330
Such that, you know, over time, maybe your things slowly goes up and that's a lot more reliable and dependable.

02:15:56.025 --> 02:16:00.665
Okay. So the way that you do this in practice, the way that you diversify your models in practice

02:16:00.905 --> 02:16:01.865
is

02:16:01.865 --> 02:16:08.105
you use platforms built in that have the ability to orchestrate or juggle multiple different types of agents just inherently.

02:16:08.820 --> 02:16:22.020
Or you use things like MCP servers or whatever that allow you to do that sort of thing within Cloud Code or within, you know, some other, you know, coding agent. And so obviously, like, now, k, if I'm just being pragmatic with you, there's there's Cloud Code and that's sort of like the big boy.

02:16:22.785 --> 02:16:24.545
And he's they're fantastic.

02:16:24.625 --> 02:16:31.025
Then there's, you know, Codex. And some people will swear on their mother's life that Codex is way better than Claude Code, but I I don't really think so.

02:16:31.345 --> 02:16:32.625
And then there's like, you know,

02:16:33.186 --> 02:16:36.306
Gemini isn't really the the right term. It's sort of like antigravities,

02:16:36.465 --> 02:16:37.825
like agent chat

02:16:38.500 --> 02:16:39.301
within

02:16:41.301 --> 02:16:42.261
anti gravity.

02:16:42.580 --> 02:16:51.780
K. And this is sort of like my little personal tier list. But basically, you know, use other models in conjunction with harnesses and stuff like that that you might have set up in ClogCode for for best results.

02:16:52.396 --> 02:17:43.480
Okay. So yeah. Anyway, there are two main major ways of doing this right now. The first is using a platform like Conductor. If you've never seen a platform like Conductor, what this does right now is it allows you to create a bunch of parallel codex and Cloud Code agents inside of isolated workspaces on your computer. You can then, just like with anti gravity or, you know, Claude code desktop app order, you can just see how their performances and what they're doing sort of in real time. And because you are just the conductor up at the top, if, know, the Claude code chunk of these don't end up working, but then the codex ones do, and that's perfectly fine. It doesn't really change anything for you. You're just gonna like momentarily allocate most of your time and energy to the codex ones. It's on the exact same interface. It's very straightforward. You just do it all, you know, through this sort of like conductor interface. Super easy. And then, uh, you know, like, this is used by a lot of real big people all over the place to basically average out minor statistical fluctuations and models, and then allow

02:17:43.960 --> 02:17:44.841
for the

02:17:45.160 --> 02:18:20.405
taking advantage of different parts of different models that are slightly better, slightly worse than each other's things. Like, for instance, a lot of people think that Codex is actually, like, quite cracked at, you know, the sort of, like, deep contemplation required to make big back ends, and it's better than Cloud Code. I don't know if I entirely agree with that. And I think even if that were correct today, it probably would not be correct in, a few weeks because things change so quickly. But, you know, this allows them to take advantage of Codec's ability to build the most cracked back end ever and then have Cloud Code do some other thing that Cloud Code is great at. Okay. So Conductor's pretty sweet. I'm not gonna worry too much about like setting it all up. It's actually quite self explanatory, and I don't wanna just make like a seven hundred hour YouTube video that's me,

02:18:20.725 --> 02:18:36.080
you know, setting up a bunch of different platforms. There there's no real value to this. These guys set out the knowledge the documentation really, really plainly and really intelligently done here. You can just click that download button, set it up, and and you'll be good to go. Okay. So that's number one. Right? Number two is you can use something like

02:18:36.561 --> 02:18:37.440
MCP

02:18:37.440 --> 02:18:41.761
servers to distribute your load across multiple different models.

02:18:42.000 --> 02:18:50.715
So for instance, there's this Codex MCP server, which, know, technically lives in Cloud Code. So if Cloud Code does go down or something like that, you won't necessarily be able to use it. Keep that in mind.

02:18:51.274 --> 02:18:58.875
But, know, if it's just one of the Cloud models or whatever, it's a little bit different. Basically, you do is you download an MCP server that allows you to communicate back and forth with a Codex.

02:18:59.440 --> 02:19:10.640
And so that one's very straightforward and easy. There's there's a git repository right over here. It's very straightforward. All you do is you literally just like install the codex CLI, k, using n pmi-g@openAIslash

02:19:10.640 --> 02:19:29.300
codex. You just give it your open AI API key. Then you just add it to Cloud Code, then you can actually just, like, have a conversation with them. So for simplicity's sake, I'm actually just gonna do that because that's a lot faster. I'm just gonna go back to my anti gravity instance, which is just right over here. You can see I got a search back a little while ago from something that I was working on. I'm just gonna open this up and I'll say install this.

02:19:30.979 --> 02:19:33.859
I'll say keys in dot ENV.

02:19:33.859 --> 02:19:35.939
Don't share. This is a demo.

02:19:36.420 --> 02:19:38.500
Let me know when done so I can restart.

02:19:39.256 --> 02:19:46.136
And what it'll go through is it'll go and install the Codex MCP server. And then I can just go here and I could say, hey, ask Codex

02:19:46.136 --> 02:19:47.336
how it's

02:19:47.575 --> 02:19:48.216
going.

02:19:48.775 --> 02:19:57.300
So now what it's going to do is rather than just, you know, kind of operate in its own thread, it literally just run through like a a thing, pinging Codex and saying, hey, man, what's going on?

02:19:57.699 --> 02:19:59.940
It echoed back the message successfully.

02:20:00.340 --> 02:20:02.900
Okay. I want to chat with Codex.

02:20:02.979 --> 02:20:03.540
Yes.

02:20:03.859 --> 02:20:07.060
And let's just hear what it has to do what it has to say rather.

02:20:07.505 --> 02:20:14.465
So codex dash CLI codex. This is just a ping, I guess, to make sure that it's online. This one is now saying, hey, I'm running on codex

02:20:14.704 --> 02:20:36.326
on g b d five in your local coding workspace. I can do all this stuff. The file system's currently restricted and so on and so forth. So, I mean, this will work in the cases where you want Claude to, like, orchestrate a conversation with Codex so that actually have me go into Codex. And that can that can be quite good when, you know, you don't really wanna, like, upset your local workflow. You still wanna work within Claude code and do everything that you're normally doing. But then for whatever reason, Cloud Code performance has been degradated.

02:20:36.405 --> 02:20:37.365
Degradated.

02:20:37.686 --> 02:20:38.565
Degradated?

02:20:38.565 --> 02:20:49.445
Degradated. But I should note that, you know, if Cloud Code itself goes down, let's say there is some widespread anthropic outage, you know, your your next best bet is to literally go and download probably like the Codex

02:20:50.140 --> 02:20:51.101
desktop

02:20:51.260 --> 02:20:52.141
app here.

02:20:52.460 --> 02:21:07.726
Download it for Mac OS and either get a subscription or at least know how to get a subscription, know how to use the app such that if there are major issues with any one of these platforms, you know, at any point in time, you can just jump right back. So it's personally what I do. I actually have Codex up and running. I know how to use Codex. I'm very familiar with Codex.

02:21:08.045 --> 02:21:10.765
You know, the way that I set up my workflow

02:21:10.765 --> 02:21:28.450
is not only do I have, like, a dot cloud with the skills and and, you know, so on and so forth, but at any point in time, I just I can just duplicate this whole workspace such that it's like generally accessible by any agent. I can actually go over here and then say, hey, for whatever reason, Claude code is down, so I'd like you to duplicate this whole business workspace,

02:21:28.529 --> 02:21:34.296
change anything that is Claude specific, like the dot claude, the claude.mdetc

02:21:34.296 --> 02:21:36.056
to, um, the

02:21:36.296 --> 02:21:40.376
usual agent specification. You can find all that at agents.md.

02:21:40.775 --> 02:22:01.075
Um, and in general, just make sure all of this stuff works for codecs. Now what you can do is you can either run some sort of, like, synchronization flow, or you could just, like, manually do this every now and then. And then you can send that off to codex, however necessary. Cool. Now it's actually going through this process of syncing the workspace to the exact same type of folder slash business dash codex, then it's just changing my agents dot m d and stuff.

02:22:01.555 --> 02:22:17.181
What you could also do is inside of the same workspace, you could just, like, duplicate this, make this like dot agents or whatever. You could have this just all go cap agents. You just probably need some line in your Cloud and m d that says, hey. When you update your Cloud and m also update your agents.md, whatever the whole purpose of this workspace is to work with anything.

02:22:17.580 --> 02:22:29.740
In my case, you know, I this is just very Cloud specific. I'm making courses on Cloud, so I can't really just mess this up and I don't want the workspace to get any any messier than it already is. But hopefully, you guys see how easy it would be realistically to do some form of diversification.

02:22:30.455 --> 02:22:40.215
Okay. So just to make it super clear, there were three main forms that I was recommending here. Right? The first form was I recommend downloading and then installing a tool like Conductor.

02:22:40.535 --> 02:23:42.470
What Conductor does is allows you to run a team of different coding agents right out of the bat using like the native CLI for Codex and Cloud Code. And so you're actually having multiple agents just like operating in parallel. They're just doing so sort of in one workspace that is not like branded or tied to any individual type of model provider. The second one is using something like the Codex MCP server, which is great to use when like Claude code is up, but individual Claude models are degraded or there's some issue that are that is preventing it from operating the way that you want it to. In that way, you could still take advantage of whatever cloud model you do have access to. And also, like, your own cloud interface, let's say, in cloud codes desktop app or maybe like an anti gravity, um, cloud code extension setup like I have. And then the third is just operating in an entirely different agent platform entirely. Um, my recommendation at least as of right now is to use Codex because, uh, every test that I've ran with Gemini is nowhere near as good, um, at anything except for front end design. Perhaps their new model will come out and that'll be way better or something like that, but I'm not gonna hold my breath for that at the moment because as mentioned, I think Claude is really just the dominant.

02:23:42.790 --> 02:24:11.480
The the dominant playboy as of right now. K. And all of this is because we do not want the monoculture crop. We do not want all of our eggs in one basket. We can have most of our eggs in the Claude basket for sure. But if you put all of them in, then you're going to suffer the exact same situation this present guy did where, you know, the second that Claude went down, he just couldn't do anything. Okay? So hopefully, that makes sense. I personally am about 70% cloud code and maybe 30% spread across codex and then like a couple of open source models. And then I use agnostic,

02:24:11.800 --> 02:24:16.840
you know, coding harnesses like pie in conjunction with things like conductor in order to make sure that I'm good to go.

02:24:18.405 --> 02:24:41.470
Alright. Now let's chat workspace organization. I'm gonna show you guys the way that I personally organize my workspace. It's discussed a couple of alternative ways. And then also just talk about like the hierarchy of information and then how to maintain like a really root clean file space. So this is the structure that I basically have set up. And I'm gonna run and go through my actual anti gravity setup in a second. I actually just had AI generate me a bunch of diagrams for this, so that's pretty meta.

02:24:41.710 --> 02:24:46.830
But to make a long story short, I store all of my business stuff in a business workspace.

02:24:47.465 --> 02:24:53.705
K. Now, my business workspace includes a bunch of additional folders that you don't really need in order to have my structure. They're very

02:24:54.104 --> 02:24:56.825
specific to the platforms that I use and and whatnot.

02:24:57.064 --> 02:25:09.280
Really, the folders that you need, if I just cross out all the stuff that you probably don't actually need. K? And like, you probably don't need this either. Some people have virtual environments, some don't. But really, the stuff that you actually do need

02:25:09.680 --> 02:25:19.215
is going to be like a dot clod, which is where you're gonna store all of your, you know, clod specific files. So it's where you're store your skills. It's where you're gonna store your agents and ETC.

02:25:19.695 --> 02:25:25.215
An active or a temporary folder or whatever the heck you wanna call it. But this is basically just gonna score everything

02:25:25.215 --> 02:25:28.815
else. So all the generated files and so on and so forth.

02:25:29.710 --> 02:25:42.750
A dot ENV where you're gonna put your, obviously, ENV type keys. So any sort of like API keys, credentials, anything like that. And then finally, your local cloud dot m d, which is just like your local system prompt.

02:25:43.475 --> 02:25:47.155
And if you guys remember, we store the global system prompts in a

02:25:47.795 --> 02:25:50.436
kinda like a tilde slash dot

02:25:50.595 --> 02:25:51.875
claud folder

02:25:52.035 --> 02:26:14.596
where, you know, the rest of your your global stuff is. And this is like this is somewhere else. This is usually like your home folder, wherever that is. On a Mac, you know, in my case, it's like Nick's or I have. So if I go on my Nick's or I have folder and then I show hidden, I can actually see the dot cloud folder. I can click on it, and I can see it under your workspace. It was like a Windows or whatever. It's it's gonna be different. So you're gonna have to look for it. Okay. So mine obviously looks a little bit different from that, but I just want you to keep in mind those,

02:26:14.915 --> 02:26:22.755
you know, the dot claud, the active, the dot ENV, and then the cloud NMD. That sort of structure that I showed you a moment ago. That's the one that I'm gonna be assuming that you you're gonna be building.

02:26:23.410 --> 02:26:29.330
Okay. So I separate things into and I also have a personal version of this, but for now, we're just gonna stick with business.

02:26:29.570 --> 02:26:30.610
A business

02:26:30.850 --> 02:26:31.730
workspace.

02:26:31.890 --> 02:26:35.730
And so I literally have like a folder on my computer, you know, Nick's arrive,

02:26:36.690 --> 02:26:38.690
and then it goes slash business.

02:26:39.586 --> 02:26:44.146
And it's within this business folder that I currently exist that I do all of my work.

02:26:44.546 --> 02:26:47.186
So what do you have inside of business? You have your dot e n v.

02:26:47.506 --> 02:26:59.711
You have your Claude skills, which is sort of like the intellectual capital that you accumulate over time as you do various SOP able things. You have your claud dot m d. Then you also have, you know, like your active folder.

02:27:00.030 --> 02:27:03.070
And the way that I personally organize this as somebody that not only

02:27:03.391 --> 02:27:12.936
uses claud code and other agents in my day to day life, but also sells clients on the implementation of these sorts of things. And then is also responsible for using Cloud Code in order to fulfill the implementation,

02:27:13.415 --> 02:27:16.296
is I separate it such that my main business

02:27:16.455 --> 02:27:19.976
needs that contain all of, like, like, my stuff

02:27:20.615 --> 02:27:43.665
is in this business folder. And then anything that I do on behalf of my clients lives in specific client folders. So let's say I a client called client a. Well, client a actually has his own dot e n v with the client's API keys. They have a dot cloud slash skills with the project skills, skills that are highly specific to the needs of that particular project. You know, if I work with, like, some sort of digital marketing agency and I have a skill that I

02:27:43.744 --> 02:27:50.944
use on their behalf in order to, like, connect to some service that they use to print out a report. Like, I would put that skill inside of the client folder.

02:27:51.440 --> 02:27:53.521
Then I also have a claude.md

02:27:53.521 --> 02:28:12.375
on that essentially, you know, I just run with a slash in it, and that also just describes a little bit about the client. In the same way that I showed you guys earlier, I have my own claude.md that describes a bunch of stuff about me. So, oh, who am I? Nick Soraya, if you know, I'm 30 years old. I'm an n dash j. I currently live in X Y Z area, here are all my businesses, here are much money I make, here's all this like highly relevant contextual information.

02:28:12.854 --> 02:28:41.626
I also have similar contextual information for my clients and then for their businesses, as well as anybody on their team. So that, you know, if I say, hey, send a message over to Jane, let her know x y z. It's literally just like one message and then and then it's sent. K. So I I duplicate that across all my client base. So client a, client b, client c, however many clients you have, that's how many project folders I have. And the key here, and the reason why I think this is like this most solid organizational scheme I've stumbled on after several years of working with this stuff, is you can actually call client skills while still being in the business, um, workspace.

02:28:41.945 --> 02:28:53.561
You know, it's not the exact same because you're not technically loading them inside of the, um, if I just go slash context here. You're not technically loading them inside of the actual context. K? You only get the ones that are like sort of local here.

02:28:54.040 --> 02:29:00.360
But, uh, you can still call skills that are not local simply by putting in your CloudNMD a one line thing that says, hey,

02:29:00.920 --> 02:29:12.126
um, there's some skills that we reference that aren't all going to live inside the dot cloud slash skills folder. These are client specific skills. If you wanna reference those, then you actually have to go inside of the client folder that I'm referencing and then, you know, pull it out that way.

02:29:12.605 --> 02:29:14.766
And so in my case, you know, the business

02:29:15.405 --> 02:29:25.030
workspace is sort of like top level and the client workspace is sort of underneath. So what's up with this don't pollute root? Always store an active or subdirectories. You know, earlier I said I have an active folder.

02:29:25.271 --> 02:29:29.671
The reason why is because if you start polluting your root, it just ends up being like a total

02:29:30.391 --> 02:29:49.255
nuclear bomb waiting to happen. You just have so many files. Your files are stored all across one giant folder. Not only is it like visually insane to look at because it's like, this is always open essentially, and it just pushes all the way down to the bottom. But it's also a little disorganized for your agent as well. Better instead to store specific locations that you dump files to,

02:29:49.851 --> 02:29:53.051
Okay. Using the skill spec itself.

02:29:53.370 --> 02:29:56.730
So for instance, inside of model chat, if I go over to my skill,

02:29:56.971 --> 02:30:12.935
you'll see that it actually specifies where to put the actual model chat. It literally says dump it inside of active slash model dash chat and then name it in this particular way. So in that way, this model dash chat skill is like she hooked up over here to this model dash chat,

02:30:13.095 --> 02:30:17.734
you know, conversation thread. I can open that up and I can actually, like, see the conversations that we have been having.

02:30:18.631 --> 02:30:43.295
It's also much more organized for the skill because I'm not just dumping everything in the same place. It's super easy to do, and then I don't actually have to do any sort of, like, agentic search or agentic lookup, which I think is pretty valuable because agentic lookups are just more things that consume tokens. So what I'm trying to say is I just store everything inside of, a folder I can toggle called slash active, and then I store any specific information as to where these things will go inside of the actual skill themselves.

02:30:43.375 --> 02:31:09.796
So, you know, there's a bunch of leads of my own CRM. That's where they live. There's like some config config files for other things. This is where they live. If I do research, this is where they live and so on and so forth. I would never store random scripts directly in root. Neither would I do temp files or data files. If you want, like, temp files files that you know are only going to be used for, like, a short period of time or in the course of a a a process being executed, Personally, I actually store these as like active slash dot TMP instead of some hidden TMP folders. So they don't even mess up my active.

02:31:10.355 --> 02:31:33.850
And you're probably thinking like, well, won't I lose stuff if everything's super nested? No. You you won't lose anything nowadays. You're trading off the amount of time it would take you to like scroll through your root thing for the amount of time it would just take you to pump it into your agent to ask it, hey, can you find x y z? But you'll find that if you just like allow the agent to organize your workspace, it it tends to do so in a pretty consistent and then reliable way. So long as you expressly give them a structure where you're like, hey, make sure to always put stuff in active.

02:31:34.725 --> 02:31:53.641
And remember earlier, talked about diversifying away from just Cloud Code. Well, what's really cool is, you know, when you'd run a business workspace like this and then you have your client and and and so on and so forth workspace sort of underneath it. What you can really easily do is just duplicate your Cloud NMD into an agents and then a Gemini. MD. You can just have all of these in all of your workspaces simultaneously.

02:31:53.960 --> 02:32:03.160
Such that, if at any point in time you wanna use, I don't know, cursor for something, you wanna open it in anti gravity, you wanna do it directly in Cloud Code, like, you never really run out of the system prompt

02:32:03.561 --> 02:32:10.565
design pattern. Like, you know, if you have the same thing written in CloudNMD, same thing in AgentsMD, the same thing in GeminiMD. You basically just, have that on twenty four seven.

02:32:10.966 --> 02:32:36.695
Now I haven't needed to do that personally in quite a while, and I've actually been very lucky to have not been affected by some of the recent outages. But I remember back, I don't know, like a month and a half ago or whatever. I actually had like a specific line that said, hey, I want you to synchronize the CloudNMD with the HSNMD and the GeminiDynami all the time just in case, you know, we have an outage, need to drop this into a different coding platform. Now another thing that'll happen reasonably often is, you know, because we're not dumping stuff into our root, we're gonna end up dumping a lot of stuff into

02:32:36.855 --> 02:32:46.455
active. Right? And so I have like just a bunch of stuff here, dub video links, CA dentist, auto research, Hindi source, you know, when I was dubbing my stuff. Bunch of different screenshots

02:32:46.455 --> 02:32:47.415
and stuff like that.

02:32:48.055 --> 02:33:13.085
You wanna periodically clean up this workspace. So you periodically wanna say something along the lines of, hey, clean up my active slash folder. Anything inside of subfolders are fine, but anything that's just loosely in the in the folder, like any TXT files, PY files, JPEGs, and related. I want you to clean up by either deciding if it's necessary. It's just a temp file, just get rid of it. Otherwise, store it in a folder that makes sense. You're gonna wanna run something like this reasonably often.

02:33:13.244 --> 02:33:50.450
The reason why is because you just don't wanna have to, you know, scroll again through like a quadrillion different things. And you also wanna make sure that any future model that comes around just like very logically look at some sort of organizational hierarchy and then make decisions based off of that. So that's what's going on here with all these docs for iClosed. Right? It's deciding what to do here. It's gonna download them into different folders. It's actually going to get rid of a couple files here like, hey, this is a file. This is an incomplete download. This is a bunch of unnamed temp snapshots. Right? And and what you'll find is within, like, two seconds, it just does the whole thing. So now my active folder is much, much cleaner, and I don't have to worry about this sort of thing ever again, which is nice. And, you know, in my case, I also have a couple of these web design projects.

02:33:50.689 --> 02:33:53.090
Enumerate all the web design projects inactive.

02:33:53.170 --> 02:33:55.330
These are things like Volta,

02:33:55.890 --> 02:34:07.915
Aura, and so on and so forth. Find similar projects and then store all of them within a web dash design folder. And despite the fact that, you know, you might be like thinking, Nick, why the hell are spending time and energy doing this? If your workspace is clean,

02:34:08.155 --> 02:34:33.445
the work that you do within that workspace tends to be a lot cleaner as well. And so, I mean, in my case, I just found what? Like, one, two, three, four, five, six, seven, eight, nine, ten, eleven or something like that, different things. I've just sorted all these out now. Anything here that is more personal than business, let me know and I'll upload it into the personal workspace instead. I just let that go, but I don't obviously wanna show you because there are some personal things in there. And that takes me to the next point of workspace organization,

02:34:33.525 --> 02:34:57.980
which is everything that I just talked to you about, um, when it comes to, like, organizing with a business at the top level and then having various client folders in, you can do the exact same thing with personal. And so I don't actually just have a business sort of workspace setup. Claude has now gone beyond just my business partner. K? And it also assists me with a lot of personal stuff. And when I say personal stuff, I'm not referring to, like, I don't know, relationship troubles or whatever. I'm talking about, like, for the most part, my

02:34:59.255 --> 02:35:02.215
you know, things like my my my citizenship paperwork,

02:35:02.534 --> 02:35:05.175
you know, important documentation relating to my identity,

02:35:05.335 --> 02:35:09.095
personal projects that I have that are, I don't know, related to, like, learning piano,

02:35:09.734 --> 02:35:19.160
that sort of thing. And so, like, I have, like, a business one over here. K? But just because I want this to be really, really clean, I'm also gonna show you guys a personal

02:35:19.479 --> 02:35:20.680
version of this.

02:35:21.640 --> 02:35:26.920
K? Which is basically the exact same thing. And then instead of doing this via clients,

02:35:27.335 --> 02:35:35.335
which, know, I mean, like, obviously, if it's it's a personal project, it's not a client project anymore, and then you can't really do it that way. But instead of doing things based off of clients,

02:35:35.575 --> 02:35:39.095
I now recommend doing things based off of like domain and or,

02:35:39.415 --> 02:36:05.296
you know, like a particular field of your life. So I haven't found the best way to organize this yet. But for instance, I have one right now on citizenship because I'm currently proving my my citizenship to, you know, a particular country in Europe. And as a result, I'll be able to be a a an EU citizen. It's gonna be pretty fun. Likewise, I have a sub one called health. This contains a couple of skills that I use to, like, visualize my genetic libraries and stuff like that. And hopefully, you guys are seeing the point. What you do is you just sort of

02:36:05.695 --> 02:36:26.631
you enumerate the clients of your personal life, which tend to be projects like citizenship, you know, your health, uh, I don't know, your skincare and whatnot. And then you contact or or or list those underneath your personal workspace. Then you also have skills related to your personal workspace like, hey, you know, can you clear out all of my, I don't know, like personal emails for x y and z. In this way, you have a good separation,

02:36:26.790 --> 02:36:28.630
at least in my mind, between business

02:36:28.945 --> 02:36:44.305
life, your personal life, and then also just logical grouping of each of the different things that you can do within them. So I also have as mentioned, you know, that personal folder. I can open that personal folder only anytime I want. It was just right back up here. And that'll just contain, you know, specific personal conversations I've had with,

02:36:45.020 --> 02:37:16.350
you know, Claude and anti gravity to do things. And I'm happy to, like, pay token costs, stuff like that to absorb that because my personal life isn't, like, personal personal. It's just stuff that is not business. Right? If I can improve the productivity, that might as well. One more thing you'll notice is that when I open up this personal, the colors were a little bit different. I do that on purpose. I do that because, you know, if I am working on business stuff, I want it to be very clearly, like, accessible and visible to, like, my my my monkey brain. Like, I instantly wanna know I'm in my business folder. Whereas when I'm in my personal folder, that's different. And so what I've done is I've I've made the outline of this green.

02:37:16.430 --> 02:37:21.310
I do that by creating this dot Versus code settings folder, and then I just have sort of like this config

02:37:21.630 --> 02:38:00.955
that Versus code reads at the beginning of every run to like actually change the header bar. This isn't like a super big unlock or anything, but I do find just like having a a slightly different color. Well, I always just make my own be like, hey, this is my personal folder, so I have access to like personal information here, I can actually have a conversation about whatever. I don't need to re prompt it with a bunch of stuff. You'll also notice that, you know, this doesn't have like the Netlify or a bunch of those other sections because this personal folder only stores stuff that is like for me. It's not for Netlify. Okay. So hopefully that gave you some insight into at least how I organize my workspace, but this isn't by no means the only way to do so. There are a bunch of other ways to do it as well. One candidate way is instead of having, like, a business workspace, what you do is you just enumerate all the projects in your business.

02:38:01.275 --> 02:38:11.740
So I don't know. You might have a a project, for instance, that's like website overhaul. What you do is you have, like, a top level folder. K? Your top level folder might be business or it might be whatever the name of your company, Left Click Incorporated.

02:38:11.819 --> 02:38:16.940
Then inside, you have a projects folder. And underneath your projects folder, you have, like, website design. You have

02:38:17.340 --> 02:38:39.280
conversion rate optimization. You have lead generation and so on and so forth. If you're running a business, you can actually now have your CRM entirely within Cloud Code as like a dot JSON file. And then, uh, periodically in a daily basis, you can synchronize using some sort of cron job or something like that too. I don't know. Some events that are pulled in from a calendar, you could store stuff that way. I've seen people host everything on GitHub as well, do some sort of like daily,

02:38:39.520 --> 02:38:40.640
uh, download

02:38:40.640 --> 02:39:15.800
or clone of GitHub, and then some sort of like nightly push so that they always have all their information stored on the cloud. You can do that in conjunction with the previous system I told you about, or the business slash personal slash client one that I talked about initially. You can also just ask Claude to set it up according to however you like. If you guys don't like the way that I set up my workspace for whatever reason, despite the fact that I do think it was probably like top 10, you know, by all means, you can just ask Claude, hey. I wanna have information for this. I wanna have information for this. Can you build me like a strong naming scheme or or system that'll enable me to do that better? Okay. Hopefully, you guys like this and it made a lot of sense to you. If guys have any questions on that, let me know. But let's move on to the next module.

02:39:16.920 --> 02:39:21.320
Now on to a topic that I think a lot of people don't like, security.

02:39:21.825 --> 02:39:27.905
And bear with me, usually, most of the time, when people talk about security, it's sort of divided into two camps.

02:39:28.146 --> 02:39:47.891
On the left hand side, you have like the accelerationists that are like, cloud code for everything, baby. I just gave it my DNA and USB stick with all of my personal private information and passwords. Let's do this thing. Then on the other side, you have like grubby old folk that used to, you know, program computers by punch cards. And so obviously, there's some irreconcilable

02:39:47.891 --> 02:39:58.285
difference there. They're like, what the heck? Why would you even, I don't know, like make something web accessible, man? You should do everything on bare metal. And then other folk are like, well, you should just have Claude code do everything.

02:39:58.605 --> 02:40:03.726
Now, the reality like most things is nuanced and in my opinion, the best case is somewhere in between.

02:40:04.410 --> 02:40:08.570
So this module and the next are gonna be a lot of talking and a little bit of demoing.

02:40:08.890 --> 02:40:35.765
But it's important for you guys to understand as Cloud Code ends up becoming more of the predominant generator of productivity in your life. But there are a few small security differences or impacts that you can have on Cloud Code that solve like 90 ish percent of all of the possible downsides and there's basically no reason not to do them. Okay. So I have this Google Doc over here that I'm just gonna walk you guys through. And really, the first point I wanna make is that everything on planet Earth is hackable. It's always just a question of how hackable.

02:40:36.380 --> 02:40:38.461
You know, your front door is hackable.

02:40:38.940 --> 02:40:45.021
Technically speaking, the the Department of Defense is hackable. Everything is hackable. It's just what is the risk and reward

02:40:45.261 --> 02:40:50.061
involved in securing it to the point where you, you know, dispel 90 ish percent of attackers.

02:40:50.301 --> 02:40:54.075
So the way I see things, you should eighty twenty security,

02:40:54.315 --> 02:41:02.155
avoid most of the low hanging fruit, and then just accept that there's always gonna be some small percentage of people that are gonna hack you anyway or try to hack you anyway.

02:41:02.475 --> 02:41:17.979
And, you know, depending on how big your vibe coded app or agentically engineered flow ends up getting, obviously, your attack surface is going to increase one to one with that. You know, just for a reference, like, when I was first starting on YouTube, I had like one login attempt per month and it was always me.

02:41:18.596 --> 02:41:43.579
Well, now I get like probably 30 to 40 login attempts per day. It's just a bunch of people that are constantly trying to hack my ass. You know, back in the day, had nothing sort of to lose, wasn't a very big deal. Now, it's obviously a lot a lot bigger. And you find this as you kind of go up the chain. You know, if you become a public figure or whatever, obviously, you're more likely to get that. Can't imagine what Chris Hemsworth fricking open claw probably looks like, but that's aside from the point. Just know that everything is sort of relative. And in in your shoes, you should just cover the $80.20.

02:41:44.255 --> 02:42:21.785
Okay. So we're just gonna get to a point where our app or setup is less hackable than the amount of time and effort it would require to actually go through it. Anybody could theoretically break into your house right now. Most people don't because there's just a little bit more effort required to break into your house versus, you know, if you just unlocked your front door and somebody could walk right in. So what we're gonna do is we're gonna put the equivalent of a fence and a camera up, eliminate most of these and then we should be good to go. Okay? So let's just cover some low hanging fruit right off the bat. At the end, I'm actually gonna give you guys a simple security audit that you guys could use to copy and paste through any sort of app or system or or website or or web property that you have to basically minimize the probability of this occurring.

02:42:22.345 --> 02:42:48.625
The first thing to know, which I think most people don't, is that you actually leak API keys every time you chat through plain text with Cloud. Now, maybe they'll fix this at a future version, but right now, it's not. All Cloud Code conversations are actually stored in this folder right here in your computer. Tilde just stands for home folder slash and then dot is a hidden convention in both Mac, Windows, and Linux. Where if you have a dot in front of something, you know, you just can't see unless you specifically enable like the hidden folder view.

02:42:48.865 --> 02:43:13.165
So what that means is you probably have a a long running log of API tokens that are hard coded there outside of, you know, dot ENV or whatever. And just to show you, I'm gonna head over to my anti gravity instance. This one is the same auto research repo that we were doing other stuff on. And I'm just gonna say, hey, I want you to remember the word. Well, let's not even do that. I'm just gonna say, hey, what are your opinions on quetzacoedals?

02:43:13.165 --> 02:43:16.444
I don't know. There's some sort of animal I think called a quetzacoedal.

02:43:18.700 --> 02:43:23.501
That's outside my wheelhouse. I'm a coding assistant, so I don't really have opinions on Mesoamerican feathered serpents.

02:43:23.580 --> 02:43:24.541
Interesting.

02:43:24.780 --> 02:43:27.900
So hopefully, I didn't absolutely butcher this. Is it quetzalcoatl?

02:43:28.540 --> 02:43:37.216
Yeah. Okay. It's this right over here. Okay. So I'm just gonna insert this into a chat history. And the reason why is because I want to open this up and then I want to say,

02:43:37.615 --> 02:43:38.815
search through

02:43:40.096 --> 02:43:43.296
dot claud in the folder for any conversation

02:43:43.375 --> 02:43:44.256
mentioning

02:43:44.415 --> 02:43:45.375
Quetzalcoatlus.

02:43:46.110 --> 02:43:59.710
And what you'll see is there's actually a long running log of all conversations basically right here in this folder. In my case, it's slash user slash Nyxtrale. That's my that's my home folder. And now, it's going to actually pull up the conversation files and give it to me word for word.

02:44:01.005 --> 02:44:03.885
Give them to me line by line, whole convos.

02:44:04.845 --> 02:44:16.560
And so essentially, you know, if we actually dive into the output there, the way that this information is stored is they're stored in JSONL files, which are like JSON files that are line by line by line.

02:44:16.960 --> 02:44:19.280
And you can actually see how they're returned

02:44:19.760 --> 02:44:21.040
just by

02:44:21.359 --> 02:44:28.305
doing a search here. I mean, I can obviously open it up, but you know, I probably have API tokens and stuff like that in there. Don't really wanna You can see that they're organized into,

02:44:28.465 --> 02:44:36.225
like, a big JSON sort of structure. Right? And so you can actually see if it pulls it out, you now have the transcript which says user,

02:44:37.024 --> 02:44:37.745
title,

02:44:37.984 --> 02:44:38.944
assistant,

02:44:38.944 --> 02:45:10.625
user, assistant. This is the exact same chat that we just had back here. And so I'm sure you can imagine, like, you're gonna have a bunch of API keys that you paste in plain text also available here. And I mean, like, that's not the end of the world. Obviously, we need to store our API keys somewhere. But a very low hanging fruit in security is just minimizing the number of places that you have the same sensitive information spread out. Like, if you have the same sensitive information, aka an API key to, like, your Anthropic account or whatever, stored in five different places. The probability somebody stumbles across this at some point, if they're hacking you or if it's just some sort of routine data check or whatever,

02:45:11.431 --> 02:45:14.950
is is like not just five times higher. It's something like 500 times higher.

02:45:15.431 --> 02:45:21.591
And I think a lot of attackers now are realizing the attack surface and a good place to, like, look for this sort of thing in in the conversation history.

02:45:21.990 --> 02:45:28.896
So, you know, you can avoid having some API key stored around, but a really simple and easy way to avoid this is basically instead of inserting,

02:45:29.695 --> 02:45:35.936
you know, I'm just gonna make like a fake dot ENV here. And then instead, I think I'm going to make a new conversation.

02:45:36.495 --> 02:45:39.295
And instead of me just saying like, hey, axolotl.

02:45:39.295 --> 02:45:42.095
K. What I'm gonna do instead is I'm going to store this

02:45:42.739 --> 02:45:46.820
animal underscore name and then we'll do axolotl

02:45:47.140 --> 02:45:49.220
right over here. Let's say,

02:45:49.540 --> 02:45:55.620
hey, I just inserted an animal name in a dot e n v for a future task,

02:45:56.100 --> 02:45:56.420
you know,

02:45:57.715 --> 02:46:00.436
very important, we do not leak this name.

02:46:00.915 --> 02:46:01.556
K?

02:46:02.195 --> 02:46:08.195
Now, what it's gonna do is it's just gonna like clarify with me. It can use this in some sort of function or whatever the heck it wants.

02:46:08.436 --> 02:46:09.955
And then if I go through,

02:46:10.890 --> 02:46:17.850
see how it says never read or display the contents of an ENV file, never commit ENV files to Git. That's another pretty low hanging fruit. If you have

02:46:18.090 --> 02:46:32.785
API keys stored in places that are not your dot ENV, a lot of people will mistakenly push that to GitHub and like, you know, if you're pushing it to GitHub, Now, it's on now, it's on the Internet as well. Right? Which is even worse. But you know, now if I go over here and I say, hey, can you find me conversations

02:46:32.785 --> 02:46:33.345
about

02:46:34.865 --> 02:46:38.225
axolotl in my and then I'm just gonna go dot claud.

02:46:39.149 --> 02:46:49.949
It's gonna search all damn day long looking for this thing and it's not gonna be able to find it because we haven't actually like specifically said axolotl. And in fact, what's pretty interesting is the only conversation it found was where I specifically asked, hey, can you find me an axolotl?

02:46:50.885 --> 02:47:01.125
So it's gonna look and see whether or not I can find it in other directories. It's not gonna be able to, but hopefully, you guys get my point. Okay? Minimizing the attack surface in a really simple way. Just have all of your API keys in a dot ENV.

02:47:01.604 --> 02:47:12.570
So that's number one. Number two, low hanging fruit is that AI models often hallucinate package names. In case you guys didn't know, package names are just like dependencies that you have to pull in order for,

02:47:12.891 --> 02:47:28.005
you know, the usage of any project nowadays, you know, like libraries and stuff like that. And so, you know, there's like NPM, which is typically like the big package manager here. I'm And just going to make this a little bit more visible for you guys. That says for node package manager. But basically, like, if you just type NPM install.

02:47:28.085 --> 02:47:28.645
Okay.

02:47:30.165 --> 02:47:38.960
Geez. I don't even know. Like, what what are some popular libraries? Anthropic? Maybe I'll just do a n p m search Anthropic? Okay. I don't know. N p m install at composio

02:47:38.960 --> 02:47:39.921
Anthropic.

02:47:40.320 --> 02:48:02.465
Like, basically, occurs every time you launch a new project or you have AI, like, design something for you is you'll you'll go through this, online resource, this big package manager, and then it'll automatically install like all of the packages it thinks it needs. And like that's usually not that big of a problem. Right? Because NPM is like pretty well vetted. But, you know, it's a package manager and so it manages hundreds of thousands, millions of different packages. And every now and then, one of these packages gets sort of compromised.

02:48:02.960 --> 02:48:19.466
Now, the issue in the way that this increases the attack surface is that AI models often hallucinate a package name. They won't actually always get it right the first time. Let's say, you know, you want a specific dependency or a package called Acorn. Okay? Sometimes, Claude, just because the way that like the tokens

02:48:19.705 --> 02:48:29.705
were were sort of baked into it, its various encoding schemes and stuff like that, will actually invent a dependency with like an extra letter, Acorn s, like acorns or acorn with an e or something.

02:48:30.689 --> 02:48:58.135
And a lot of people that are sneaky and terrible and super evil and malicious have have sort of known about this for a while because of like various encoding issues and the statistical probability of adding additional letters and stuff. So what they've done is they've actually created new packages. K, with small little misspellings of the main package. And they've made those packages contain malware, things that literally say, hey, I want you to go through their dot ENV and then go through all of their, you know, tilde slash dot clog conversation logs and then send it over to me.

02:48:59.301 --> 02:49:04.741
And so the idea there is, you know, it'll obviously exfiltrate anything that is important to you, then it'll gain basically full control over your account.

02:49:05.061 --> 02:49:08.181
It's a form of like, I don't know, prompt injection almost. But,

02:49:08.980 --> 02:49:23.085
you know, if you're making any sort of live projects or ones that tie to API keys with any sort of unlimited usage, you know, there are gonna be some out there where, I don't know, you just turn the unlimited extra usage token, uh, thing on, and then you'll have access theoretically to, like, billing tens of thousands of dollars for a service.

02:49:23.404 --> 02:49:53.565
Be very careful with that. You should just audit your dependency list for any unfamiliar package. You should actually ask Claude, like, hey, are there any unfamiliar packages that you don't actually actively use all the time? Or, you know, hey, before you instantiate this the first time, I want you to take a look at all at the NPM run and ensure that the only packages here are, like, legitimate packages that have verified histories and are not, like, inserting malware, I'm kinda concerned. And I'll give you guys like a whole security audit you could use for stuff like that in a moment. But the point that I'm making is like, is another attack vector. Okay? A lot of people don't realize this, but in addition to leaking API keys and getting it all over the place, and models all also hallucinate package names.

02:49:54.170 --> 02:50:03.051
The third main thing has to do with databases, and this is gonna apply mostly to people that are creating full stack apps or apps that, you know, need to call some sort of external

02:50:03.210 --> 02:50:17.176
data store. A lot of the time nowadays, to be honest, I just store everything with JSON files directly on my computer. It's a lot easier and simpler for me because I'm not really developing full stack end to end apps as much these days. I'm for the most part, just designing flows for myself or internal tools for my team.

02:50:17.575 --> 02:50:21.095
But anyway, assuming that, know, you wanna go a little bit further than that, actually develop full stack software

02:50:21.870 --> 02:50:28.110
Essentially, the simplest and easiest way to ensure that, like, 90% of all noted

02:50:28.591 --> 02:50:35.230
database breaches do not occur on your app is you just use this one little button called row level security.

02:50:35.726 --> 02:50:37.006
It's very straightforward

02:50:37.245 --> 02:50:55.510
and basically nobody does it, which sucks. So Supabase, which most of you are probably gonna be using for any sort of vibe coded app function, does not enable RLS by default. They'll probably do so at some point. But for now, what that means is if somebody signs up to your app, you know, typically, they're given a key by which they can access their own database table.

02:50:56.310 --> 02:51:06.524
Well, if they have a public key on a database that does not have RLS enabled, they can read, write, delete every other row in your database. And so you have a lot of cases where, you know, there is some simple

02:51:07.165 --> 02:51:21.951
I don't know. There was a database for like Mold Book, which was like supposedly Facebook for agents. That was just a few months ago and, you know, everybody was like, god, this is revolutionary or whatever. And then, like, the most elementary of security audits done by some cybersecurity fella showed that, like, they did not have database or RLS,

02:51:22.110 --> 02:51:24.110
a real level security enabled on their database.

02:51:24.591 --> 02:51:49.540
So he just went in and then he, read literally every single AI agent that had ever been created on the platform in, like, two seconds. Then, because he also had write access, he created like a a 100,000 fake AI agent profiles in like two seconds. Funny enough, Meta, Facebook actually ended up buying them and hopefully, they understood that a big chunk of those profiles were fake, but who knows, maybe they didn't. The point that I'm trying to make is like very, very low hanging fruit. Takes like two seconds to do. And once you're done with that, you can you can kind of move on.

02:51:50.340 --> 02:51:54.580
Okay. Be wary anytime you're publicizing a system like OpenClaw,

02:51:54.580 --> 02:52:02.455
like your little OpenClaw package to the web. So let's say you have some open URL. Let's say this is my Openclaw. Okay. And it's nickhappyfuntime.com.

02:52:02.455 --> 02:52:05.415
I'm kinda curious if I click on this. Is there anybody at nickhappyfuntime.com?

02:52:05.415 --> 02:52:07.815
Okay. Thank God. There's nobody at nickhappyfuntime.com

02:52:07.815 --> 02:52:17.860
because I probably have to sanitize my eyes after that. Anyway, imagine you have your Claude bot or molt bot or whatever the heck it's called now on nick dash happy dash fun dash time dot com.

02:52:18.580 --> 02:52:27.535
Well, odds are if you have a URL, and it's like a short straightforward URL, and it's on an IP range that is like owned by, I don't know, some virtual private server hosting provider.

02:52:28.175 --> 02:53:08.615
You are gonna be queried constantly by people that are looking for vulnerabilities. They will be scanning, okay, all over the place for every single port that's currently open in your computer. There are huge bot farms, for instance, in China, in The Philippines, in some Indonesian countries, and obviously the West as well. I'm not just trying to point a finger over there. But, you know, that's predominantly where a lot of these attacks come from. And there are huge bot farms that people have set up a long time ago that literally that their whole job is they just send tens of thousands of requests per second to like every URL constantly, scanning to see like, hey, have they patched this one thing? Hey, do they have this security vulnerability? Hey, do they do this? And the second even one of those things is good, like, you know, allows them access. Now they have full access to your freaking machine and box, basically, and then they can do whatever the heck they want with it.

02:53:09.095 --> 02:53:18.910
So I want you to know, like, if you set up some sort of, like, public facing server using some sort of VPS based approach on, uh, you know, like hosting or whatever that like, of these, like, major hosting providers,

02:53:19.390 --> 02:53:22.030
know that it is constantly going to be tested.

02:53:22.510 --> 02:53:24.109
And if you are, like, wild,

02:53:24.575 --> 02:53:27.936
you're raw dogging this, you're wild westing this, you don't, like, understand

02:53:28.335 --> 02:53:41.891
some pretty foundational things about, like, firewalls and, you know, RLS and and and so on and so forth, like, people will find vulnerabilities. Your stuff will be hacked. And so the idea is just make sure to whatever you are putting in there is not like super extraordinarily sensitive.

02:53:42.130 --> 02:53:50.051
You know, don't give your open claw agent your social insurance number or like a picture of your passport or whatever. That to me is like way too accelerationist.

02:53:50.051 --> 02:53:58.976
And I'm not being the old grubby person yelling at clouds in the sky being like, back in my day, we used to punch card stuff. I'm just trying to be reasonable here. Right? Just no need to do stuff like that for the most part.

02:53:59.296 --> 02:54:14.831
You know, if you have like a a local Claude instance that's running, that's authenticated through Telegram and then you're using like, I don't know, the the the Claude channels feature or whatever, probability that a hack will occur there is much, much lower because you're just running it locally and you're not actually connecting through like an open thing. You're connecting through a vetted,

02:54:15.230 --> 02:54:21.311
you know, telegram kind of connector or plug in. But if you're just like Openclaw raw dogging it, yeah, be be very careful with that stuff.

02:54:21.965 --> 02:54:34.125
By the way, this isn't just me ragging on Openclaw for the four thousandth time. I'm trying to be reasonable about this. I think decentralized autonomous agents are obviously the future at some point. But, you know, most of what we've seen so far has literally just pissed away people's API keys and credit card information.

02:54:34.989 --> 02:54:41.550
Speaking of credit card information, never touch a credit card number. So if you guys are designing systems that interface with any sort of credit card whatsoever,

02:54:41.630 --> 02:55:00.755
don't actually like store that data. Don't actually read that data. If that data gets read at any point by like an AI agent, hell, even your AI agent, guess what's gonna happen? Well, same thing. You know, you're gonna leak those API keys. You're gonna stick them in your conversation history. And then any sort of hacker or you at any future point in time, if you misconfigure stuff, push stuff to GitHub or I don't know, like

02:55:01.030 --> 02:55:13.830
trading your computer or whatever, you'll now have like a big log of all of that information just in plain text, which is easily available. You know, a lot of people will just like RedX over your entire computer looking for things like, you know, credit cards that they can access. And then what's a credit card? Well, usually, it's like,

02:55:14.625 --> 02:55:18.705
was it 16 or 20 characters or something? I have to check my credit cards now, but it's like very,

02:55:18.864 --> 02:55:23.265
very stereotypical. Right? You find 16 or 20 characters all connected together,

02:55:23.664 --> 02:55:31.610
maybe like with a space in between, boom, you got yourself a freaking credit card. Maybe you don't even. They just look for that then they check to see whether or not it's like a Visa pattern. If it is, you're screwed.

02:55:32.170 --> 02:55:41.930
So, anyway, I guess what I'm trying to say is, like, don't put that liability on yourself by storing other people's credit cards if you're running, like, some sort of business thing, and then don't put that liability on your own card by storing your own card here.

02:55:42.506 --> 02:55:56.665
You know, use services like Stripe. They do everything for you. They are super compliant, PCI compliant, all this stuff. They they they teams that just like focus on making sure that stuff that is stored on their servers never gets screwed up, then you never actually have to deal with, the compliance regulatory aspect of touching credit.

02:55:57.860 --> 02:56:03.221
Alright. Now once you're done sort of understanding this, which should be now because hopefully nothing here is super complicated,

02:56:03.380 --> 02:56:05.381
although some of these concepts are advanced, I understand.

02:56:05.940 --> 02:56:14.456
All you need to do is just run anything public facing through some form of security audit for like maybe the other eighty twenty. And so this is a security breakdown that I created for

02:56:15.495 --> 02:56:22.295
a vibe coding course where I was showing people how to make full stack apps. Pretty cool using Gemini in case you guys are interested. I guess it's Gemini and ClotCode.

02:56:22.840 --> 02:56:26.200
You can find that on my channel if you want to type like next drive vibe coding or something.

02:56:26.680 --> 02:56:27.240
And

02:56:27.560 --> 02:56:47.246
essentially, down here at the bottom, what I have is I have a big security audit prompt where you can actually just feed this into Claude and then have it like point out all of the security issues with whatever your your your flow is. And so what I'm gonna do is I'm gonna go back here to anti gravity. And I mean, I sort of I don't really have like anything that's public facing here, I'm still gonna run it through auto research. Then I'm gonna just create a new one and I'll say,

02:56:47.966 --> 02:56:51.565
apply this to our auto research

02:56:52.570 --> 02:56:53.530
flow,

02:56:54.330 --> 02:56:56.410
the one optimizing left click.

02:56:57.210 --> 02:56:59.771
Once done the security audit, return me

02:57:00.410 --> 02:57:04.970
everything we need to fix. I know nothing is web accessible ATM.

02:57:05.636 --> 02:57:22.700
K. And so what this does is it's just some it it's just a big prompt that I developed in conjunction with a bunch of agents. I had to like read a bunch of security blogs and so on and so forth to like look for the the biggest low hanging fruit and the simplest minor configuration changes they could make. And, you know, what it's gonna do is just go top to bottom and then apply this.

02:57:23.100 --> 02:57:24.300
The reason why I'm

02:57:24.620 --> 02:57:29.420
spinning up a totally new conversation history is because I do not want any sort of conversation

02:57:29.660 --> 02:57:39.375
context to bias what's going on here. I don't want the same agent I used to develop my tool to actually also run the audit because odds are it's going to be biased and it's going to do some specific

02:57:39.615 --> 02:57:42.255
it's gonna make specific errors because it's gonna think that it's better.

02:57:42.575 --> 02:57:52.341
Do you see here how it's searching for s k underscore live, s k underscore test, s k dash bear, and so on and so forth. These are all API token headers, basically. These are like the titles of API tokens.

02:57:52.660 --> 02:58:02.021
What it just did there, other people are going to do at any point in time if they gain access to your system. Same thing here with like model weights and same thing here with like bash scripts and stuff like that. Okay?

02:58:03.135 --> 02:58:13.375
Anyhoo, so we're just gonna read this top to bottom architecture summary, gives me some brief details about what's going on. It's not a web app. It's a local single GPU ML training pipeline. It's easy. No hard coded secrets,

02:58:13.455 --> 02:58:26.201
but the git ignore does not include the .env.env local and so on and so forth. Okay? All the stuff that actually applies here is going to be filled in. So in this case, this is an actual failure, but in this case, it's not as not applicable because it's not an actual web app.

02:58:26.601 --> 02:58:33.160
Then you can see that there's also some sections where it fails. So finding number one, supply chain low popularity package. Right? Supply chain issue.

02:58:34.405 --> 02:58:41.205
Let's see. Over here, it's failed on some machine learning specific risks and it's sort of putting that out. It's funny that it's using the term vibing.

02:58:41.285 --> 02:58:48.245
I like that. Anyway, so I'm not really gonna go through everything with you, but basically what you do is you you you finish this and then you just say, okay, great.

02:58:48.900 --> 02:58:51.301
Fix according to your suggestions.

02:58:52.101 --> 02:59:00.660
K. And then once it's, you know, once it's done and whatever, I'm just gonna pretend it's it's done now even though it obviously isn't. This might take you like three or four minutes if you're running on something that isn't like,

02:59:01.525 --> 02:59:19.971
you know, fast mode like I typically run stuff on. What you do is you just go through and then you actually implement it. And just like I showed you a moment ago to use something that is not biased with the conversation history, you spin up another agent to take the recommendations and then actually go through and do it. Because you also don't want that implementer agent to be biased by the security audit kind of overly constrained nature of it.

02:59:20.370 --> 02:59:29.970
So in that case, you can use a sub agent or some other model itself like Codex, Gemini, or whatever. And then, you know, ultimately, you can have it reviewed by Claude because I think Claude is the best model. But in this way, you're basically like diversifying.

02:59:30.195 --> 02:59:48.510
Similarly, how we're diversifying by putting seven out of 10 of our eggs in the Claude basket, but three out of the 10, you know, spread across other models. You're diversifying against any sort of inherent risk or bias that Claude has to work that is generated by other Claude's versus, know, Codex or Gemini or whatnot. So the best solution would actually involve multiple runs through all of them.

02:59:49.069 --> 03:00:21.721
Okay. Hopefully, that makes sense. I mean, I didn't want this to be a big deal. Obviously, security, as mentioned, is only as big of a deal as you are willing to make it because of preexisting assets and what you have to risk and stuff like that. So if you just understood what I talked to you about right here, and then if you get, you know, a security prompt like what I showed you here, you you should be good. Just pass something like that through an agent after you've done a project, and it'll like cover most of low hanging fruit. And by the way, you want that security audit, then definitely check out that vibe coding full course. Really easy, just type mix drive vibe coding. I actually give you guys all that information for free there. You can also watch it if you wanna learn how to develop things with other models.

03:00:23.080 --> 03:00:26.681
Congratulations. You made it to essentially the end of the informational

03:00:26.920 --> 03:00:35.125
clawed technical content of the course. And now, I just wanted to reserve maybe ten or fifteen minutes to chat a little bit about what I consider to be the future of Claude.

03:00:35.365 --> 03:00:50.661
Not just the future of Claude code, but the future of Claude the model, as well as the future of just agentic engineering in general. And the reason why I talk about this is because it's a topic that's very close to my heart. I've been considering this for probably the last ten or so years. As a kid, they grew up on science fiction, you know,

03:00:51.540 --> 03:00:53.301
foundation from Asimov,

03:00:53.301 --> 03:01:17.230
tons of Arthur c Clark books and Heinlein and so on and so forth. I I've thought a lot about like what the far future would look like in an environment that is controlled by agents like Claude Code. And I've also thought about some of the intervening steps we need to get there. And now that it's sort of being thrust in my face, I think there's a lot that you could realistically learn from even just like fictional representations of this. That most people who probably haven't just stuck their head so far in the science fiction bubble. I think,

03:01:17.630 --> 03:01:43.690
you know, I think would find value in here. In addition, I obviously have a lot of exposure to both mid market and then enterprise here. Not to mention all the small businesses that I work with through LeftClick. And I think that gives me sort of an edge here to at least give you guys some sort of plausible future that has more of a 10% chance of probably being true. I mean, like, things are changing so quickly. I obviously can't be a 100% sure what is going to occur. But these are some things that are considered to be like pretty low risk bets that if you make, you'll probably have some form of alpha. Alpha.

03:01:44.170 --> 03:01:47.130
Okay. So the first main one is

03:01:47.450 --> 03:01:50.170
this trend of decreasing human involvement.

03:01:50.410 --> 03:02:00.625
Do you guys remember earlier when I showed you guys that diagram where it was like vibe coding to agentic engineering to basically, like, research based direction with auto research and and frameworks like that coming up.

03:02:01.025 --> 03:02:11.860
Well, this is still something like we are creating. Right? It's sort of like open sourced, not necessarily open sourced, but, um, it's something that, like, you know, the community is sort of working on. But all of these approaches are soon to be quite formalized.

03:02:12.100 --> 03:02:17.860
And it is very likely, in my opinion, that we are going to continue decreasing human involvement in tasks.

03:02:18.100 --> 03:02:22.979
This auto research thing is a great example of ways to, you know, democratize sort of like little improvements.

03:02:23.385 --> 03:02:57.125
I've kept this auto researcher running, by the way, if you guys have remembered from like a couple of modules ago. And we're now actually at like almost eight thousand millisecond load time from a baseline of 18 o two. Imagine if you had this running three thousand days in a row or whatever, or if you had this running at, like, inference capacities a 100 x this, right, which we are obviously getting to. You guys remember how slow GPT three was back in the day, if anybody here is an old head that used that? Well, GPT 5.4 fast or instant or whatever is way faster. And imagine if you had a model that's a 100 times that that fast with the same level of intelligence. You can make some major updates to basically anything.

03:02:57.604 --> 03:03:20.060
And so the idea is, you know, we're probably not going to increase the level of human involvement in, like, direct coding and stuff like that, which is fine. I'm not making like a value judgment or a normative judgment here. But I imagine you as a developer or a business person or whatever, will actually probably grow less involved in the day to day work of either your company, your research lab, your your your your your app, whatever the heck.

03:03:20.915 --> 03:03:30.355
And so my take is, in the future, we're gonna move towards this sort of thing that a lot of frameworks have tried to formalize, which is that we're each gonna be the CEO of sort of like our own

03:03:30.596 --> 03:03:46.811
company. Whether it's an actual company in practice or whether it's, you know, some sort of organization that's like a company. All of us will basically be just like the the chief executive officer running teams or fleets of agents that are constantly doing things on our behalf and that have some sort of formalized framework that also, like, helps them optimize and and make better.

03:03:47.535 --> 03:03:55.936
And so sort of the the way that this works, I imagine, is we would go from, you know, like the old school Wright brothers flying the plane ourselves

03:03:56.015 --> 03:03:58.815
to sort of like modern aircraft engineers,

03:03:58.895 --> 03:04:22.135
where there's somebody in the cockpit. But for the most part, you know, an autopilot is taking over the vast majority of the work. Even in, you know, like takeoffs and landings now, there are obviously like so much so many SOPs and so much like a a process and framework that, you know, you can imagine how a system that was much less developed than ours, much less capable of deep thinking and stuff could actually probably just execute it entirely at this That said, you know, will we ever get rid of a human in the loop to some capacity?

03:04:22.375 --> 03:04:34.830
There are just so many regulatory blocks, and I think like ethical issues with that, that we will probably always just have some person like manning a ship. It's just the number of ships that a person will man. The number of of discrete agents will just continue increasing.

03:04:35.150 --> 03:04:42.511
Until, know, rather than have a 100 people do a task in some specific company like we used to have, we might have one person do a 100 tasks. Leverage will go up.

03:04:43.070 --> 03:04:45.870
Now, a good example of this is Claude's recent

03:04:46.396 --> 03:05:00.155
auto mode. I don't know if you guys have seen I said auto mode, but I don't know if you guys have seen their recent development. Or basically, now have the ability to run some sort of autonomous mode instead of choosing, you know, switch permissions or sorry,

03:05:00.235 --> 03:05:01.435
execute

03:05:02.200 --> 03:05:19.336
bypass permissions down here or ask before edits or edit automatically and and so on and so forth. Well, now we basically have an additional one auto mode, which I just can't see here right now because I'm using a slightly older version of Clocker. I don't have that yet. But basically, you know, instead of you actually having to, like, go through this whole process of changing the security,

03:05:19.976 --> 03:05:28.695
changing the access that it has, you know, Cloud just does that for you. So, like, that's a pretty good example of something that used to require a person, and I was just like, well, Cloud's gonna get a 99.9% of the time. Screw it. I'll I'll give it to

03:05:29.420 --> 03:06:22.865
Okay. So that's a very small microcosm, but, like, imagine the rest of the loop. Like, planning loop right now, typically, you have Cloud develop a plan for you and then you implement on that plan. That whole thing is just like being internalized. Like, we're not actually doing most of the plan development now. We we will not continue to do most of the plan development in the future. Realistically, Cloud's gonna do both the planning and the implementation. Then the q and a, it's like right now, we're sort of in the loop. We're sort of like clicking in the buttons, running it. Well, they're developing automated testing procedures where Cloud actually also does the q and a for and then delivers you the whole thing. And so some people hate this because they're like, well, they're taking my jobs and whatnot. And I think there's I think there's a fair point to that. You know, human beings' leverage will continue to increase, but depends on like how much work is there really to do. How many software products are there really to develop? Do we actually are we even gonna have, like, the demand for that sort of thing? And I think that's like a reasonable conversation to have. And then, you know, unfortunately, I don't know the answer. My my take is, like, eventually, we're probably gonna have to move to some sort of different economic system because most of the world would be unemployed otherwise.

03:06:23.265 --> 03:06:24.545
But that's me getting all political.

03:06:25.279 --> 03:06:34.320
That's number one. Okay. So the trend of decreasing human involvement is very likely to continue with clot code. They're now at the point where they're developing this so rapidly that like AI is helping AI design products.

03:06:34.560 --> 03:06:42.075
And, you know, automotive is just the beginning of like, I think a massive suite of rollouts that will significantly improve your experience. But, you know, make you more hands off.

03:06:42.795 --> 03:06:44.875
My second one is more of like an economic

03:06:45.115 --> 03:06:46.075
consideration,

03:06:46.155 --> 03:06:51.275
which is that software products and tools, k, the the quality of the things that you build

03:06:51.675 --> 03:06:52.795
will no longer be remote.

03:06:53.341 --> 03:06:58.461
So in the in the past, in the good old days, back when I was on the come up, how good your software was?

03:06:58.780 --> 03:07:00.221
Think like Windows.

03:07:00.301 --> 03:07:02.461
Think like, you know, like Mac OS.

03:07:02.780 --> 03:07:06.860
How good that operating system was? Might have been the only thing that distinguished

03:07:07.146 --> 03:07:17.146
that operating system from another operating system. And if it was really, really good, then obviously it would be much more popular and then it would get, know, a bunch of like inherent interest and stuff like that because of the capabilities and you'd obviously use it.

03:07:17.705 --> 03:07:22.266
So the issue with that nowadays is you can make Netflix in five minutes.

03:07:23.029 --> 03:07:36.310
Netflix before was this innovative streaming model that, you know, was like, wow, you know, you could just load the thing and then the the the the video loads on you for for you on demand and it's incredible and like the streaming and latency and uptime and all that stuff. It's like super proprietary technology.

03:07:37.006 --> 03:07:43.246
Well, now it's like, I can code Netflix in five minutes with, like, you know, three or four agents on fast mode. So it's like, what is the value of Netflix?

03:07:43.726 --> 03:07:57.741
What is the moat that differentiates Netflix as sort of like this, like, old school medieval castle from all of the attackers that, you know, could actually take it down? Well, the moat now and this has been something for at least a couple of years. The moat now is no longer the software. It is the distribution.

03:07:57.980 --> 03:08:00.860
So in a world where everybody has basically like a

03:08:01.421 --> 03:08:03.261
I don't know, a nuclear weapon,

03:08:03.261 --> 03:08:22.530
is the differentiator like everybody has a nuclear weapon? No. The differentiator moves to other things like, I don't know, the political framework, like the wellness of the populace and stuff like that. What I'm trying to say is like that that skill, that software engineering ability is no longer going to be the moat. And instead, the moat is going to move to, you know, the connections that a company has to its consumers,

03:08:22.689 --> 03:08:26.050
the reputation that the company has in the market, the

03:08:26.449 --> 03:08:43.905
distribution that it has with a bunch of vendors that, you know, are hard won relationships and connections that they realistically built over the course of many years. You know, Netflix now has a bunch of patents and and rights and licenses and stuff like that to air specific shows. It's seen this coming and and so it's tried to diversify accordingly. But you're gonna see that in basically every software platform.

03:08:44.065 --> 03:08:55.870
The moat will, like, probably move more to the distribution and the legal and compliance aspects than necessarily like how good the software is. Which means you're gonna have like these cracked, probably like fourteen, fifteen year old kids designing like the most incredible amazing software ever.

03:08:56.189 --> 03:09:10.275
And then that software will be able to reproduce anything that like a major business would do in like a hundredth of the time. But, you know, because they don't have like the compliance or or whatever certifications or whatever, you know, it'll probably be more difficult for them to actually go to market with something like that despite it being like objectively superior.

03:09:11.155 --> 03:09:24.210
And, you know, the way that I see is we already have AI models that are at the limit of human reasoning capability. They can run hundreds of times faster than our brains, soon to be thousands of times faster than our brains on basic tasks. So even if they're not, like, better than us at the software design individually,

03:09:24.210 --> 03:09:25.570
if you run a thousand,

03:09:25.810 --> 03:09:45.555
you know, 90 IQ models, comparatively, like, one one hundred IQ human, Those will eventually figure out the things that that one one hundred IQ human would do. And not only will you develop more software like quality, you'll also develop more software quantity. And so software as a just a market thing, supply and demand, like, economically. The supply will be so damn high that the demand for any sort of, like, purchasable software gets a lot lower,

03:09:46.159 --> 03:10:05.345
Which means I personally don't think like a SaaS product is really the play here. I don't think there's gonna be any sort of life cycle for like subscription based products. I think you'll have a short window of time where you could actually just monetize like a one time buy product. And then most people will just say, well, should I spend a $199 on the product, or should I just spend $19 plus 30 minutes of my time on tokens? And they just design it for myself.

03:10:05.825 --> 03:10:09.505
And I think that's gonna change the way that we do, you know, like software more generally.

03:10:09.905 --> 03:10:30.510
So I'm not very bullish on like, know, developing software as a service apps and stuff like that. I have a lot of people be like, Nick, you know all this stuff? Know You know how to design all the software? Like, why aren't you making a software app and why why aren't you monetizing your community, let's say, through software? And I'm like, I'd only really be able to do that for a short period of time. And then even if I were to, like, where's the value in that if anybody could just make it? I'm saving them like twenty minutes and a couple bucks in tokens. Right? It's not that big of a deal.

03:10:31.274 --> 03:10:33.915
So, I mean, I would I would move accordingly, I guess.

03:10:34.475 --> 03:10:36.955
Because the third thing that I'm like 99.9%

03:10:36.955 --> 03:10:41.835
sure of is that the pace of change is not slowing down anytime soon.

03:10:41.995 --> 03:10:43.915
It will continue to accelerate.

03:10:44.234 --> 03:11:01.431
Just as technology has helped us increase the pace of change through our history with things like the printing press, with developments and, you know, communication with like the telegraph and so on and so forth. You know, these things don't just improve the quality of life of the average person, they improve the research

03:11:01.146 --> 03:11:09.146
and development arm of technologists who work on that exact thing. And so because of that, you know, the pace of change is is basically just going up.

03:11:09.386 --> 03:11:12.346
If I had to graph sort of where we are now,

03:11:13.400 --> 03:11:14.440
and I will

03:11:15.160 --> 03:11:17.400
because I freaking love graphs. Right?

03:11:18.120 --> 03:11:19.320
Just the best.

03:11:20.440 --> 03:11:23.320
And if I were to graph the intelligence,

03:11:23.400 --> 03:11:28.440
which is a very loose term here and obviously means different things to different people, but the intelligence of a model over time,

03:11:29.115 --> 03:11:31.915
you know, basically, I'd go like this.

03:11:32.635 --> 03:11:37.436
Okay? And so this back here was sort of like linear growth from like maybe like the nineteen

03:11:37.595 --> 03:11:39.835
seventies and stuff with like Minsky,

03:11:39.995 --> 03:11:53.721
you know, nineteen seventies and eighties and stuff. Minsky and like the the first few neural nets and stuff like that. Then this right over here is probably like, I don't know, 2010 when models started actually doing stuff. Right? Then this over here is like 2020.

03:11:54.681 --> 03:11:56.120
You know, this over here

03:11:57.295 --> 03:11:58.175
is like

03:11:58.814 --> 03:12:01.055
2025,

03:12:01.055 --> 03:12:01.695
and then

03:12:02.095 --> 03:12:05.455
this over here is 2026. Do you see how how, like, high

03:12:05.774 --> 03:12:09.854
this is going? How quickly? And then a point that I wanna make is basically like,

03:12:10.360 --> 03:12:14.280
this right here is the intelligence of maybe like a like a chimpanzee.

03:12:15.000 --> 03:12:15.721
K.

03:12:16.200 --> 03:12:19.000
This right here is the intelligence of like an average human.

03:12:19.320 --> 03:12:20.681
And then this right here

03:12:21.080 --> 03:12:23.320
is maybe the intelligence of like Einstein.

03:12:25.015 --> 03:12:25.655
And

03:12:26.055 --> 03:12:34.614
what we what we have now is, you know, we're we're like right over here, man. These models I I say as smart as a chimpanzee, not to didn't diminute or whatever chimpanzees.

03:12:34.614 --> 03:12:44.860
But, you know, their brains are extraordinarily advanced and developed. They have these cerebelli, these these sections of their brains that are responsible for calculating, like millions of of movements and and so on and so forth every minute.

03:12:45.260 --> 03:12:57.935
Like, it's a very complicated thing to like replicate the intelligence, the distributed intelligence of an organism. And you don't capture that all just by like, hey, can it write? Hey, can it, you know, reason and do math? Have you ever seen like a chimpanzee's like memory? Have you seen its like ability to like,

03:12:58.335 --> 03:13:00.335
you know, move around on a page and like

03:13:00.575 --> 03:13:18.520
figure out symbolism and then symbols, sorry, and then like counts numbers up in their motor neurons? Anyway, the point I'm making is not this is a course on chimpanzees, so I'll stop talking. God, that's my nerdy side, Shari. But that the gap between the intelligence of a chimpanzee, if you just count up all the neurons in its brain, intelligence of a human if you count up all the neurons in brain, intelligence of Einstein.

03:13:19.125 --> 03:13:44.530
Actually very close together. They're very clustered. And I'd say, like, we're basically right over here right now. So I guess what's gonna happen in, you know, the next few years. This is gonna go like up here. And we are going to it's gonna be like, wow. These things are so dumb. They're dumb. Oh, wow. Cute. They can do things that a chimpanzee can do. And then, like, six months, it's like, oh, okay. These things are now, like, you know, freaking galaxy brain intelligences that, you know, can do everything and anything for us. And imagine what happens when, you know, all of this is just humans working on stuff,

03:13:44.930 --> 03:13:55.225
and then eventually gets to the point where you can actually, like, use human level intelligence, which is right now, to, like, improve its rate of growth. This thing is just vertical. I mean, this thing would go so vertical it'd go through my roof in two seconds.

03:13:55.785 --> 03:13:58.105
So that's my take on it personally. I think,

03:13:58.505 --> 03:14:17.181
you know, I think we're getting really, really close to super fast paces of change. And if you guys have, like, been monitoring the the Claude, even Claude code x page recently, or, like, seeing YouTube, there's there's new updates coming out every day. This would have been unfathomable just, three or four years ago to make this level of development and this level of, like, small additions to a software product while also making sure they're testable and reliable.

03:14:17.836 --> 03:14:20.556
Just because intelligence is making intelligence

03:14:20.716 --> 03:14:21.996
more intelligent now.

03:14:22.955 --> 03:14:26.156
And then the last thing I'm gonna say is that the people that

03:14:26.476 --> 03:14:27.756
will control,

03:14:27.915 --> 03:14:56.205
not necessarily control, but have the most like power and ability over the course of next years are people that learn to use this technology now. You're part of a very, like, privileged minority, and I don't say that in, like, the political sense of the term because, yeah, I think that's all muddled up. But, like, you're part of a minority of people right now that, like, actually use this technology. Do you know how few people even understand what an agent harness is? We're talking like sub, like, 1% of the population of Earth. The percentage of people that know how to use an agent harness like you are doing right now, uh, is even less. It's a fraction, vanishingly small percentage.

03:14:56.685 --> 03:15:21.495
I don't know if everybody that watches this, uh, is old enough to remember, but there were, like, some protests back in the day on Wall Street. And the point is that they were like, we are the 99% or whatever. And they were protesting the massive wealth divide in specific parts of America between like, you know, really, really wealthy people that work on Wall Street and then like the populace, the rest of the people that like, I don't know, manage the service industry and hospitality and basically do everything else. And they're like, why do you guys get to have like thousands of times more money than us?

03:15:22.535 --> 03:15:24.535
You are the 1% right now.

03:15:24.855 --> 03:15:41.660
You are that group of people that I'm sure in the future other people will be raising their hands about and, you know, shaking their fist at. Because you have an enormous capability to use models like this for just cents on the dollar to do incredibly amazing economically viable things that would take that other group of 99%,

03:15:41.660 --> 03:15:55.444
like like months to do what you could realistically do in a day. It's insane. You know, I I think you could talk all day about, the wealth divide, you can also talk about, like, the productivity divide. And the wealth improves the likelihood that you will be in that product the positive chunk of the productivity divide.

03:15:56.540 --> 03:16:03.421
You right now, even if you don't have a lot of money, have access to insane technology and leverage simply because you're in it. So that's going to increase.

03:16:03.900 --> 03:16:24.230
Now, William Gibson, one of my favorite authors said it best, the future's here is just unevenly distributed. Meaning that like, we have access to insane technology. It's just like not all of us do at the same rate. There are small pockets of people like yourself that understand how to use these tools far better than others and in doing so, you have the ability to reap asymmetric rewards over a small chunk of time. And my take is as the economy shifts to accommodate smarter than human intelligences,

03:16:24.551 --> 03:16:39.635
the people that understand things like agent harnesses and coding harnesses, people that understand how to use the best models in the world like Claude, you know, Opus or or Mythos or whatever the heck we're at now. People that know how to turn these into economically valuable things are the ultimate people that are going to win this share of the future,

03:16:40.114 --> 03:16:44.755
whatever small percentage it is. Because given the massive unbounded upside here, like we're talking,

03:16:45.314 --> 03:16:57.780
you know, solar panels orbiting the freaking sun in a few year like, we're we have solar panels, but the point that I'm making is the massive potential upside of if everything goes right with this technology, things don't go super wrong.

03:16:58.100 --> 03:17:02.260
If you own even point 00000001%

03:17:02.260 --> 03:17:08.536
of that potential future because of some decisions that you made today to, know, upscale and start this productivity kickoff.

03:17:09.495 --> 03:17:13.336
You know, like the the the abundance of your own personal life would would be huge.

03:17:13.735 --> 03:17:14.695
Okay. So

03:17:15.176 --> 03:17:34.700
I guess that's it. We made it to the end of the course, and that's really all I have to say on that. Hopefully, you guys appreciated learning everything that I had to give on Claude code, and you guys have learned some advanced concepts here, whether it's about, you know, initial system prompts and and and Claude. Mds, or it's some of the more obscure things and esoteric things like security or the future like I just talked about.

03:17:35.635 --> 03:17:50.435
If you guys like this sort of thing, you'd be doing me a big solid to subscribe to the channel. For whatever reason, something like 70% of my regular viewers are not subscribed. I think it's just how YouTube works. Most people don't sub, but you can you can sub. That would really help me out. I wanna get this sort of message out to more people and obviously help them be in that small little chunk.

03:17:51.410 --> 03:18:22.985
If you'd do me a solid, leave a comment down below with a video idea or something that you want me to cover. I actually get most of my ideas directly from my audience now, so I'd really appreciate that. If there's anything that I didn't cover here, maybe didn't touch on that you would like me to touch on, or maybe anything that I personally made a mistake on, I'd I'd love to hear it because I'm trying to improve my ability to use these tools. Finally, I also help other companies implement this sort of thing in their own businesses, whether you are a small to mid sized business, mid market, or enterprise. Um, so if you wanna chat with my team, just, uh, check down below, uh, somewhere at the top of the description. There'll be a link. Thank you for making it all the way to the end of the video. I'll see you all soon. Bye.