WEBVTT

00:00:00.000 --> 00:00:18.375
I spent hours interviewing Claude Code on exactly how it works. I wanted to understand all the nuts and bolts because one of my many goals is to be a top 10% user of Claude Code across the board. And if you clicked on this video, then I assume that you have the exact same intention. So whether you're a technical or nontechnical,

00:00:18.375 --> 00:00:24.055
I'm gonna walk you through every single thing that you need to know to understand Cloud Code better than 99%

00:00:24.055 --> 00:00:52.165
of people out there. And here's the thing. When I was a kid, I used to buy computers, break them apart, and rebuild them just to understand how each and every part kind of came together. So after breaking apart and mapping each and every part of the CLOD code system, I not only have a deeper understanding, but I know exactly what's happening behind the curtain. If you watch this video till the very end, then you will too. Let's dive in. So like I said in the intro, we're gonna be diving deep into how each and every part of Claude code works. And the goal is that you understand

00:00:52.245 --> 00:00:57.390
what's happening behind the curtain, but also the fact that the actual engineering is unbelievably

00:00:57.390 --> 00:01:01.950
simple. It's just the way that everything comes together where you have this harmony of orchestration

00:01:01.950 --> 00:01:44.145
that makes these agentic workflows possible. So we're gonna explore nine different elements of Cloud Code, and I will do my absolute best to keep things in as plain English as I possibly can. There will be some software engineering concepts, and I'll do my best to break those into analogies so it fully lands. Before we even dive into the details, I wanted to show you how you could run your own interview with Claude Code, and this is what helped me get into the nuts and bolts and ask it for advice on how memory works, how sessions work, how context works. So if you go into Claude code, most people don't know this, they've come up with a series of sub agents. And if you've watched my last video, I talked about some easy ways to use sub agents. So as a refresher, if you ask Claude straight up,

00:01:44.545 --> 00:02:01.865
hey. What agents do you come with out of the box, specifically sub agents? And we send it over, I'll show you which one we care about right now. And you'll see in the response, the one I really care to show you is this Claude code guide, which is literally designed to answer questions about how the Claude code CLI, the agent SDK,

00:02:01.865 --> 00:02:05.785
and the Claude API work. So all you have to do to invoke it is write at

00:02:05.945 --> 00:02:06.825
Claude

00:02:07.065 --> 00:02:07.705
code

00:02:07.945 --> 00:02:36.385
guide, and now you are basically tagging the coding agent. So if you ask it something like, can you explain to me, like, I'm in grade 10, how the memory function and context management of Claude works, and can you create some form of ASCII art so I can visually see how the system works? And you send this over, you'll now get a glimpse of how I started my deep dive into each and every nook and cranny of Cloud Code. You'll see right there, it comes back with a breakdown of how the short term memory works,

00:02:36.865 --> 00:02:41.370
how it is structured in terms of different messages and how it summarizes messages,

00:02:41.690 --> 00:02:43.130
how long term memory works,

00:02:43.450 --> 00:02:49.050
and the full picture. So it breaks down each and every part of the system with a series of key takeaways.

00:02:49.210 --> 00:02:53.770
So this is your cheat code. So if you watch this video and you have even deeper questions or

00:02:54.065 --> 00:03:10.960
you have a particular use case and you wanna see if you can harness deeper power in Claude code that you might be putting on the shelf, then this would be the best way to do that. Now back to our learning journey. This is the TLDR of the Claude code system. So you have the Claude code CLI, the interface itself, where you have the terminal,

00:03:11.200 --> 00:03:14.160
you have the session manager, you have the tool executor,

00:03:14.160 --> 00:03:50.810
and you have what's called the permission layer. Now if you're a complete novice and noob to Claude code, the permission layer is essentially when it's asking you, hey. Can I edit this file? Can I get your approval for this? Can I install this package? And if you want, you can go into, what I usually do, YOLO mode or bypass permissions mode to allow it to do whatever it wants. Now as you'll see when we walk through the components of the CLI, a lot of this stuff exists in the wild, and it's even open source. The real magic is how it gets married to the Claude API and more importantly, the Claude models themselves, especially OPUS 4.5. The rest of the system is really how it interacts with your local machine,

00:03:50.970 --> 00:03:58.945
which is why it's become so useful for personal applications like organizing files, creating folders, and doing a lot deeper analysis,

00:03:59.025 --> 00:04:42.400
looking for system memory where there might be leaks, doing anything you want on your local computer, as well as interacting with things like GitHub, which you're not familiar, is like version history for code that you can always go back and pull the latest one that you had and then push the latest code that you wanna commit there. So if we were even to summarize and bring Claude code to the highest layer, you have a user interface, and then you have what's called the orchestration layer. This layer is where the Claude code brain lives. This is where the intelligence lives. And everything below it is how it executes tools, which tools it executes based on what scenario is popping up. And the last part, like I alluded to before, is the approval and security layer where this takes control to make sure that Cloud Code is designed at least to be your coding companion,

00:04:42.640 --> 00:04:48.800
not some autonomous agent, even though people are obsessed with using it in that way. Now here's something that most people don't realize.

00:04:49.295 --> 00:04:54.175
Now there are lot of words on screen here like React, Inc, TypeScript, Node. Js.

00:04:54.415 --> 00:04:56.975
All of this stuff, all of this core infrastructure

00:04:57.135 --> 00:04:59.695
is open sourced and has been used for years.

00:05:00.015 --> 00:05:11.360
Anthropic didn't actually invent any one of these singularly. What Anthropic did in the Cloud Code team is marry this, like I said before, with the Cloud API. So these are things that can be abstracted,

00:05:11.360 --> 00:05:14.560
is why you have things like open code. You have antigravity

00:05:14.560 --> 00:05:20.025
because a lot of these principles, like the way you search a code base, the way you traverse different files,

00:05:20.265 --> 00:05:24.345
all has existed before. It's all about them putting them together modularly

00:05:24.345 --> 00:05:36.610
like a series of Lego blocks and combining them with actual intelligence. So ClaudeCode is a combination of smart engineering standing on the shoulders of open source giants. So another key concept that's made Cloud Code viral

00:05:36.610 --> 00:05:47.490
is how simple it is. Most people, especially nontechnical people, if you're watching this and you hate the idea of a terminal, Cloud Code made something like the terminal easy to use and much more powerful than ChatGPT.

00:05:47.885 --> 00:05:53.165
All it is is the same box. You can change the color of the IDE you're using. It could be antigravity.

00:05:53.165 --> 00:06:42.965
It could be cursor. It could be whatever you wanted. But behind the scenes, all of the thinking is happening from the model, orchestrating all the bits and pieces, the files, the options, and the tools. And if you wanted one single diagram to understand what Cloud Code does behind the scenes is step one, it gathers the information. It reads your prompt. It reads the Cloud MD, which is the command center of Cloud Code that it looks at as the first thing when you initiate a brand new terminal session. And then it acts to make changes based on what it's read and understood and ideally has planned out and you've approved that plan. And then it verifies whether it did the right thing, it did it correctly. If there's a test that it can run, it will run that micro test, and you as the user can encourage it to write even more tests and make sure that they're not fictitious tests that are passing according to some weird criteria,

00:06:42.965 --> 00:06:58.030
and that's how the loop keeps on going. So if you have those three words running in the back of your head, gather, act, verify, we can dive deeper to see how it does those three things. So when it comes to the life cycle, in the gather stage, like we said, it understands before acting.

00:06:58.270 --> 00:06:59.310
So understanding

00:06:59.310 --> 00:07:02.510
means it reads files, it searches code,

00:07:02.590 --> 00:07:05.390
it explores the structure of a folder,

00:07:05.390 --> 00:07:24.590
a code base, whatever you give it, and then it asks you proactive questions, or at least it should ask you proactive questions. Especially in the later versions of Cloud Code, you have this tool. It's called the ask user input tool, and it basically pops up this multiple choice question that could be multiple tabs long asking you for preferences,

00:07:24.830 --> 00:07:25.550
direction,

00:07:25.790 --> 00:07:44.155
especially if your prompts are on the vaguer side. When it comes to acting, this is where things are make or break. So acting could be physically editing files. It could be creating brand new files and folders, shifting one file to the next or one folder to the next, running commands called bash. And for all intents and purposes, if you are nontechnical

00:07:44.155 --> 00:07:45.515
and you're a nondeveloper,

00:07:45.755 --> 00:07:51.940
bash just lets it do things on your computer that you can do through a terminal. And just to give you a very tangible example,

00:07:52.100 --> 00:07:56.500
let's say when you open a folder or a brand new browser session in Safari,

00:07:56.580 --> 00:08:02.500
Chrome, whatever, you are double clicking on some EXE file that makes basically executes

00:08:02.500 --> 00:08:14.595
the browser itself. And just to give you an analogy on this, imagine you are opening a brand new browser, let's say Google Chrome, and you double click on that icon. Behind the scenes, you could write some code that could also open that application

00:08:14.675 --> 00:08:21.300
from the backdoor, from the actual system itself. Bash allows you to do this, meaning it can take control of your computer,

00:08:21.540 --> 00:09:04.915
change settings, change permissions. It can do whatever you want, and that's where you have to be careful because you can nuke your entire laptop, you can nuke different services if you let it run wild without understanding exactly what it's doing. And the last thing it can do, which is super helpful, especially as someone who comes from the data scientist world where I had to live in Python quite a bit, is if you need a specific package. And for all intents and purposes, think of a package as either a program or a bridge of functionality that can take your application or your use case from a to b without you having to intervene. Packages are essentially compressed code that allow you to execute all kinds of functions without you having to figure it out from scratch. So instead of you having to figure out how to convert something like HTML

00:09:04.995 --> 00:09:56.085
to a PowerPoint x file, there's a library that's existed and has already done this well, eighty twenty, that you can just take from the web, install on your computer, and run. And the verify stage is where Cloud Code runs automated checks to see what did it do, how well it did, and go down the path if it didn't do the right thing. So if everything goes well, then you won't have to go to the next step. But if it doesn't go well, so if it does not work, you'll see right here, it will try something, check the result. If it doesn't work, it'll keep looping, which is why if it goes down this rabbit hole and you haven't given it the best instructions and you haven't given it the best code base or the best ClotMD to run and do what it needs to do, it will endlessly go in a loop. And if it fails, it will keep going, meaning your context window or as you'll see my analogy for it, the bucket will get full. The one key thing though, the ingenuity,

00:09:56.245 --> 00:10:32.445
is once it does figure it out, Claude can learn from each failed attempt. Meaning if it's tried to access some form of API, let's say, make an image for you or a video using Gemini's Nano Banana or VEIO API, and it keeps failing to either send the request to create the video or image or retrieve it and actually display and render it for you. Once it figures it out, it can commit its understanding of what to do and when to the Claude MD file, or you can even make a markdown file called a playbook or commit it elsewhere as a skill. So if you wanted a real example of a Claude code flow, let's say you're fixing a bug and you were vibe coding an app and you're adding some form of authentication

00:10:32.685 --> 00:10:37.085
and you create a login, and in the login when you click on a button it's not working.

00:10:37.770 --> 00:10:47.930
So you can tell it fix the login bug. Ideally, you tell it what that bug is. When I click it, for some reason, it doesn't reroute me to another page. It doesn't let me do Google authentication,

00:10:48.010 --> 00:10:51.610
whatever. It first gathers information, finds the authentication

00:10:51.610 --> 00:11:19.180
code, reads any error logs in the application itself, and then based on that, it can either create a plan or if you put it into YOLO mode or bypass permissions mode, it goes to act to edit the associated files. So instead of trying to YOLO and go through the whole code base and pick a file and randomly change it, once zeroes in on what files might be affected and could be the culprit of this bug, it then goes to act on this and maybe fixes it if there's a typo.

00:11:19.260 --> 00:11:39.155
Once in a while, as soon as you understand how it changes code and augments code, it can forget certain syntax. That syntax could be the difference between a functioning button and a non functioning button. And once it verifies and now we live in a world where you have agent browsers where it can click into the browser, spin it up on what's called local host, meaning you run the application

00:11:39.580 --> 00:11:59.695
locally on your computer, and it can click around and see did it resolve the issue, and then you set it off on its own new feedback loop. But devoid of that, you as the user can also verify that as well. But devoid of that, you as the user can verify that as well. And like I said, the core goal is that you're always in control. And I didn't mean for that to rhyme, but it does so happy days.

00:11:59.935 --> 00:12:28.605
But the goal is that you are a human in the loop. It wasn't designed to just go off on its own. Many developers are working with these things called Ralph Loops or Ralph Wiggum, which I think is a lot of hullabaloo and a waste of time and basically brute forcing Claude Coat to do something that it it wasn't designed to do, comma, yet. We'll probably get to the point where it can go fully autonomous if you want it to, but it was designed in mind to be annoying on purpose. So if we go back into Claude code and we go into the terminal,

00:12:28.845 --> 00:12:41.360
by default, a lot of you will see the question mark mode. Or if I open up a brand new session, let's say I open one of these. Let's click on any one of these just so I can see my bar, and I click on this,

00:12:41.840 --> 00:13:10.790
and I open this up, and I close this. Right? So you usually will see something like ask before edits. The whole point of this is as you're starting out, especially if you're a noob and you have no clue what you're doing, then it will ask you before it does things. And as you give it permission, it will create a file, a settings dot JSON file that will store okay. It looks like Mark is okay with me editing a file in this way. It looks like he's also okay with me installing this package. So if you wanna be able to bread crumb your way to competence,

00:13:11.110 --> 00:13:20.070
even though it's annoying, I understand, it's helpful to do this because it helps you be more aware of what's happening and gives you a chance to intervene if you need to.

00:13:20.470 --> 00:13:24.555
Expert tip, if you use something like this extension, which if you're not familiar,

00:13:24.715 --> 00:13:31.835
I like using cursor because it's been around the block for a while. Some people like using anti gravity. Maybe one day we'll use it, but not for now.

00:13:32.155 --> 00:13:39.770
This is what it's called the Claude code for Versus code. If you want to go on YOLO mode, you wanna graduate from there, you can go on to settings,

00:13:40.090 --> 00:13:41.450
and then you can go

00:13:41.770 --> 00:13:43.290
on to the bottom here,

00:13:43.690 --> 00:14:04.460
and you can make it so it always is on YOLO mode. But once again, I'll tell you, don't do this until you know what you're doing. Now Claude has different categories of tools, and you can think of them as specialized workers. So you have file reading tools for observing and checking your code before it decides to act and do something. You then have very importantly, we'll touch on this in-depth,

00:14:04.620 --> 00:14:42.380
search tools. This allows it to not only search the entire code base, but also look within a file and not necessarily bloat your limited context window by loading the entire code at one shot. Now search is unbelievably important not because it just lets you search your entire code base, but it also has the judgment powered by the intelligence to not necessarily take a whole file and ram the entire file of code, which could be tens of thousands of lines of code versus a snippet where it thinks the issue or the opportunity lies. And then like we said, we have execution tools that can run commands on your computer. You have web tools to search the web, especially do deep research.

00:14:42.540 --> 00:14:43.900
You have orchestration

00:14:43.900 --> 00:15:54.535
in general, which allows you to manage different workflows. And one example of a micro workflow is a skill MD file that I'll touch on later, and then you have extensions or plug ins. So the TLDR of the process is Claude in the cloud Claude in the cloud, decides what tools it needs to use, and then it checks whether or not has permission to use said tools. If you are in bypass permission mode, it will then just go and execute it and retrieve and analyze the results. If you wanna get into the mind of Claude code, when it's assessing what it should do based on your request, it looks at the following different functions. So it tries to understand what does this person need? What does Mark need? Does he need to read something? Then great. We'll use the read tool. Does he need to find certain code? Then no problem. We'll use this thing called the grep tool. And if you see Cloud Code running sometimes and you are nontechnical and you're seeing weird words like glob and grep, well, grep allows it to do that search. And I'll tell you how it does that search shortly. If it needs to change a file, then it will use the edit tool. If it needs to run a command, then like we said before, it will use the bash tool. And finally, if it needs to search the web, its plan a is using internal web search. But you can do things like add skills that use perplexity.

00:15:54.790 --> 00:16:41.135
You can add different plugins from the marketplace. You can do whatever you want. But out of the box, this is what it has at its disposal. And once again, once it executes a task and it goes from Claude running a tool to getting the result of that tool, then that tool result creates brand new context, a brand new state, if you will, of understanding of reality. And then if it's something that needs to remember, it'll commit that and it creates brand new persistent context. Now if we dive deeper into the tooling of Cloud Code, let's take a look and see what do the read, write, edit, most importantly, glob l s actually do. And, again, if you're nontechnical, stick with me. This is not gonna be very painful. So these are essentially Claude's hands if you wanna think about them. So the read tool, pretty straightforward, but it's nuanced in the way it works sometimes. So let's say you have a document.

00:16:41.455 --> 00:17:09.015
It will then read that document, and then it will see lines with numbers because if you see a PDF, it will see different lines of numbers of that PDF. And PDFs, by the way, are a great example because if you tell it to read a 50 page PDF file, you'll be shocked at how quickly your context fills up because PDFs in their essence are full of noisy tokens behind the scenes that you don't see with the naked eye, but to create the visualizations,

00:17:09.095 --> 00:17:19.520
to create the file itself in the way it's smooth and buttery. There's a lot of tokens that get rammed into memory for no reason. So expert tip here is if you see Cloud Code attempting to read a huge file,

00:17:19.760 --> 00:17:27.920
ideally, tell it to create a Python script where you use any of the cheapest APIs possible. Let's say the Gemini 2.5

00:17:27.920 --> 00:17:40.825
flash API that has a million context window. Tell it to go use that API, create a Python script to read it, and then report back the TLDR, the summary, or the most salient points. This helps you offload

00:17:40.825 --> 00:17:42.105
the read tool,

00:17:42.345 --> 00:18:10.835
shoving all of this unnecessary text into your context window and preventing you from taking the next step. Now the right tool is helpful because it creates everything from scratch, from your code to brand new markdown files. And for me, whenever I go through a session and we've had to traverse back and forth into errors and feedback loops, when we finally get a crystallized understanding of what has worked and what's worked well, I always tell it to either commit its understanding as a compressed TLDR in the CloudMD file,

00:18:11.075 --> 00:19:13.135
or I tell it to create a playbook of exactly what we went through, what decisions were made, and what the final outcome was, and how we could go directly from a to the final outcome next time. So in a way, is a glorified version of reverse meta prompting, is something I used to talk about a lot on this channel back in the ChatGPT days. But now that we've leveled up, you can still adopt and migrate this concept over here. Now the edit tool is not mind blowing, but the one nuance here is ideally it tries to do pinpoint edits. So instead of refactoring and recreating the whole file, which would take many more output tokens, are more expensive and take more usage from your account, then it tries to find the exact old string. If you don't know what a string is, it's essentially text. You can think of it as text or anything that's converted into text format, and then it tries to directly replace it with a brand new string. Now we're getting to words you probably haven't heard of, but you see quite a bit in Cloud Code. And one of them is glob. And if you look at the word glob, probably looked at it and you're like, okay, whatever. But it's actually doing something really important.

00:19:13.640 --> 00:19:17.560
When it goes and looks for files, it's looking for file patterns.

00:19:17.560 --> 00:19:22.040
So if you say something like go and search all the files associated with JavaScript

00:19:22.040 --> 00:19:37.485
or whatever it is that's powering my front end of my user interface, it will go and search for, let's say, the ending. This asterisk is a wild card. This could be anything. It could be any name, but it's primarily looking for anything that is dot t s, which stands for a language called TypeScript,

00:19:37.485 --> 00:19:56.340
which is used heavily in a lot of these vibe coded apps that you see nowadays. Nowadays. And if we zoom in just a bit, you'll see that we have asterisks here located in different parts. So if it knows exactly what folders or subfolders to look for, then it will try to also narrow its search to preserve the number of tokens being used just for exploration.

00:19:56.725 --> 00:20:12.485
Because ideally, the system is designed so it can focus on the building and satisfying your requirement more so than prepping on how to actually tackle it. So if you're still with me, this is typically the full flow of tool calling. You have glob go and find files ideally by their suffix,

00:20:12.910 --> 00:20:31.785
and then you have read to understand that code. You have edit if needed to make any changes ideally with surgical precision to said code, and then it reads it to verify the changes, and then we go back into that feedback loop. Now this is probably another word you haven't heard before, which is grep. So we have glob that finds patterns.

00:20:31.865 --> 00:21:06.145
We have grep that does actual search, kinda like you would search on a website or if you go on a retail site and search for socks and you wanted to do some form of elastic search there, that's where GREP comes in. So if you say, wanna go and edit how we do x y z thing on the profile part of my app. Let's say you wanna go so that someone can edit their profile picture, and it doesn't exist. You just have first name, last name, and email. So it will go and look for auth, and then it will see every single file that contains the word auth. If that doesn't work and it finds nothing, maybe it will brute force finding

00:21:06.320 --> 00:21:30.075
profile or first name as a variable because it knows, okay, if you have profile and there's only first name and you wrote that in your prompt, then there's likely some form of variable that someone has to fill out. Let me go and find that. So this is what it's doing behind the scenes to go and narrow down not only because that same thing can appear in multiple files, by the way, depending on what app it is, the dependencies, the use cases, etcetera.

00:21:30.650 --> 00:22:01.790
And behind the scenes, it's all powered by this thing called ripgrep. And the reason why this has been adopted in Cloud Code is it is incredibly fast. So it can quickly, like I said, traverse your whole code base, do pinpoint searches, and then behind that, do surgical accuracy to edit it if needed. And the reason why rip grep, which powers grep, is fast is it runs things in parallel. So if it wants to search a keyword, it won't search one file, then queue it up, then do the next file, and queue it up. Otherwise, your cloud code sessions would be that much more painful.

00:22:02.030 --> 00:22:45.495
What it does is if it's isolated four or five files, it will go and search them all in parallel and come up with its own understanding of which one needs to be changed or whether or not it needs to edit multiple files. So regular search is not only slow, but it could take many tokens as well. So that's another added benefit, which is that it's memory efficient as well. So if you wanted to think in three different search modes, you have files with matches. So where is this specific type of file? You have content based matching, so where is this specific context or series of functions. And then you have count, so how many of x. Maybe it sees off five times, and then it uses that to zero in and go to the next step. So the tilde r of search to read workflow is you have grep going and finding the files,

00:22:45.735 --> 00:23:08.830
then it searches everything before it actually reads it to see what is worth reading. Here's where the nuance here and the engineering ingenuity comes in. And once it narrows down what's worth reading, then it reads it, and hopefully it's reading only the part that matters. Alright. So next up is bash. And for you nontechnical folks out there, this is the part where your eyes might start to glaze over. So I'll do my best to make it as painless as possible.

00:23:09.455 --> 00:23:15.695
The TLDR of what bash is once again is it allows Cloud Code to run commands on your computer,

00:23:15.935 --> 00:23:20.655
create, run tests. If you wanna do something like committing your code to git,

00:23:20.895 --> 00:23:59.900
which again is a proxy for version history, it's literally version history on steroids, then this allows you to do all of that. So Claude can run NPM install to install packages like we said. It can create tests for different things that it's created. It can go and organize files and test whether it did it correctly according to your requirements. It can do all kinds of things, and it can spin up a local server. This especially becomes very helpful when you're running things like local host, but also apps that might need some more firepower. And instead of you deploying it right away on the cloud, like on Vercel or on render, it will allow you to do what's called docker build, which will let you create this containerization

00:24:00.060 --> 00:24:32.580
of whatever you've put together and run it in a very isolated manner. So you can also limit the blast radius if things go wrong. So bash is fundamental to how Claude code works and more importantly, why it's become so quote unquote famous, especially when it comes to local based tasks. And the way it's designed is that you always have control by default unless you forego that control. So Claude will quickly just check whether or not it has permission to take control of your computer in the way that it's looking to do so. If it doesn't have control, this is where it will ask you for approval,

00:24:32.980 --> 00:25:12.885
and then it will run on your machine, return an output. And like we said before, it will go through this feedback loop to test what it did to make sure it did it the right way and the best way. So there are two core scenarios that can happen with Bash depending on the tasks at hand. If there are a series of small and quick wins, you'll have to wait until it queues them all up and executes them. Otherwise, if it detects that there's a longer command that will take five, ten minutes to run, it will push that into the background and run different tasks in parallel. And this is fundamental because this allows you to run and continue all kinds of other work. So you can even open another terminal and have the other terminal take care of whatever it is it started on, and this bifurcation

00:25:12.885 --> 00:25:14.805
of tasks and task management

00:25:14.885 --> 00:25:35.065
makes Cloud Code that much more potent. Now in terms of the common bash patterns that you might see on your screen even if you don't recognize them is when it comes to version control, you might see things like this. So git status, git commit, git push. And if you're still watching this video and you have no clue how to even get started with GitHub, easiest thing honestly is going to github.com.

00:25:35.145 --> 00:25:46.720
You create a brand new account. Once you create an account, then you can use it for free. And then once you have it ready to go, you can go into Claude. And let's say you're just purely on the terminal. I'm gonna make that assumption.

00:25:47.040 --> 00:25:51.520
I will just do slash right here, and I'll do slash

00:25:52.080 --> 00:26:15.480
install. You can see right here. You could do Slack or GitHub. If you walk through this, you'll have a wizard that walks you through exactly what you need, the keys you need to grab. And once you go through this slightly painful process for fifteen, twenty minutes, you'll be good to go, and then you can use Git wherever you want. Meaning, you can use the words I want you to commit this or create a new repository for this to store all the code.

00:26:15.720 --> 00:26:29.805
Every time we make a change, I want you to commit it, and a commit is like a checkpoint in version history. This will allow you to do that that much more easily. So we do something like git status, the crash course here is it will just see what's going on in your different branches.

00:26:30.125 --> 00:26:36.445
You can think of GitHub as this tree. It's literally called there's one part called the main tree, and then they are called branches.

00:26:37.000 --> 00:27:00.355
Branches let you build things without touching your main tree, meaning if you're building an app to take that example and you implement a feature and that feature's on a branch, you have the ability to audit whether or not you can safely merge it to your main tree or will it break everything. So it gives you that extra roadblock or that stop sign. Commit means literally committing or pushing whatever code you have and storing it as another checkpoint

00:27:00.435 --> 00:27:12.880
and usually pushing it to your main branch, which means it goes straight production. If you've ever used any of the browser based tools, let's say the lovables, the bolts of the world, usually, anytime you make a change and you click publish,

00:27:12.960 --> 00:27:15.120
it takes effect. There's no intermediary.

00:27:15.120 --> 00:27:16.160
There's no sandbox.

00:27:16.735 --> 00:27:31.790
And when it comes to versioning, you can add more and more layers the way you would as a developer where you can create a sandbox where you go from committing small changes to having them on branches to then either merging them or testing those branches separately in a sandbox environment,

00:27:31.870 --> 00:27:39.550
but this allows you to have more flexibility to build responsibly. And when it comes to building, any of these commands here will allow it to do what's called compiling.

00:27:39.550 --> 00:28:33.690
So let's say you're building a React based app. If you don't know what React is, 90% of vibe coding apps that you see nowadays are built using this framework called React. Then when it's doing a build, it's trying to compile all the code in a way where it can render it as the web page or the app that you end up seeing on your screen. Okay. So we have four more sections to go. So hold on if you're still with me. We will get to the promised land of milk and honey where you can say you understand how Cloud Code works better than 99 of people. Now when it comes to context management, this is arguably one of the most important sections for you to understand what your limitations are in dealing with Cloud Code. So you can think of this context as a bucket. And in the bucket, as you have a longer conversation, that bucket fills up until it gets to the very top, and this is usually where the average person compacts the conversation and keeps going. Now there are implications for how compaction happens, and I'll touch on that shortly.

00:28:34.010 --> 00:28:45.835
But for now, out of the box, behind the scenes, if you ever open a brand new session, and if you wanna see this visualized, all we have to do is go into Cloud Code and send and submit this called slash context.

00:28:46.075 --> 00:29:01.370
This will show you your overall context window. So you could see right out of the gate, in this case, this project is very bloated. The Cloud MD is very bloated. So we start off at a huge disadvantage. We have the overall system prompt that tells Claude how to act in all conversations,

00:29:01.370 --> 00:29:04.970
and then we have our Claude MD behind the scenes that is polluting

00:29:04.970 --> 00:29:08.410
our context window. So we're already at a 80% disadvantage.

00:29:09.215 --> 00:29:21.215
So if we go back, these are taking a lot of my personal bucket. As you push claw to go do glob and grep, if you remember, glob is searching for patterns, grep is searching for actual searches within files,

00:29:21.740 --> 00:29:34.140
and you ram all kinds of context and make it read it, this bucket really fills up here. So the eighty twenty of filling it up is usually associated with reading files or reading code bases, especially larger ones.

00:29:34.865 --> 00:29:35.425
So

00:29:35.905 --> 00:30:04.610
on top of that, if you have MCP servers, especially if they're very interactive, let's say you're using the Supabase MCP. If you don't know what Supabase is, it's a database that has a back end that allows you to more easily create what are called edge functions and tables. If you allow the agents to run autonomously and build these tables and test them out, that feedback loop of the results of those tools, which is usually in JSON format, will take many tokens because JSON is very token heavy. And if I'm saying the word token is still not resonating,

00:30:05.065 --> 00:30:06.425
as of this recording,

00:30:06.585 --> 00:30:10.185
Opus 4.5 has a 200,000 token window

00:30:10.345 --> 00:30:13.065
of which let's call that a 150,000

00:30:13.065 --> 00:30:20.505
words even though it's not one to one. So you can imagine how quickly things can escalate, especially if you throw something like a PDF.

00:30:20.920 --> 00:30:23.640
So if I were to open a brand new session here,

00:30:24.040 --> 00:30:26.520
and I have this huge PDF just to give you a visual.

00:30:26.920 --> 00:30:28.440
Let's open this up.

00:30:30.520 --> 00:30:31.960
Reveal and finder.

00:30:32.040 --> 00:30:35.400
Open this up. You could see this is a very large index report.

00:30:36.325 --> 00:30:55.890
And because it's full of characters, like I said, it's full of tokens, and you can see that right here. Look at all these lines that to you, the naked eye, you don't know this exists in a PDF, but these are taking tons of tokens yet they offer zero value to you as the user for Klau to know about it. So if you were a complete noob and you said,

00:30:56.210 --> 00:30:59.730
uh, read this and you said economic index,

00:31:00.210 --> 00:31:03.890
watch what's gonna happen. It is going to fill that bucket

00:31:04.210 --> 00:31:07.915
immediately. We're gonna get to a 100% because we're already at 22%.

00:31:07.915 --> 00:31:27.930
Remember, this will take us to a 100% because it's so thick in tokens that it will completely max everything out, and you will be pushed from the beginning to compact the conversation against your will. And you can see right here, it did manage to quote, unquote read it, but look what happened to our context window. It's a 100% used and have yet to do anything.

00:31:28.170 --> 00:31:34.815
Now there are many solutions around this. One of them is, like I said before, you can have a script go and read this PDF

00:31:34.975 --> 00:31:42.335
and then break it down, create a markdown file out of it. Markdown files won't be full of all that garbage you saw, all that garbage metadata,

00:31:42.335 --> 00:31:49.600
and then you can feed that in, or you can ask the script to use something like Gemini or Claude or OpenAI

00:31:49.680 --> 00:32:31.135
and then summarize it and then feed that to Claude, especially depending on whatever it is that your use case covers. So with that, you can quickly see that you can easily take up space for absolutely no reason. Now there are quick fixes on top of scripts, for example. You could have spun up a sub agent to have a virgin 200,000 contacts window to go and read that PDF and report back on what it found or the summary of that PDF. Totally an option. So you have to be very smart and nimble on where you wanna use Claude's abilities. It's not worth it just send it blindly to read a PDF and just have it max out everything from beginning. And one thing that I've observed and many others is as soon as you pass the 40 to 50% threshold,

00:32:31.295 --> 00:32:33.135
Cloud Code is still usable,

00:32:33.375 --> 00:32:51.050
but it starts to really degrade in the quality of the answers, the quality of the code, and how lazy versus proactive it is. So at the beginning of a conversation is where you wanna preserve and be as frugal as possible with that context window. So you can even think of it as a physical budget where you have 200,000,

00:32:51.050 --> 00:32:54.730
and every single time you send a request, you're sending $10,000,

00:32:54.730 --> 00:33:29.995
and you keep keep working your way towards that overall window. If you take that mentality, then hopefully, we'll push you to be a lot more thoughtful with the next steps you take, especially if you're not on the max plan, especially if you're on the $20 plan or the $100 max plan. It's smarter to plan out everything. Use plan mode, especially since they've recently upgraded plan mode to create the plan, explore the code base, then you have the option to clear your context window and then start a brand new session to execute said plan. Now this part is super important, and this is when it comes to compaction. So if we go back here, I have to naturally do

00:33:30.315 --> 00:33:31.115
compaction.

00:33:31.595 --> 00:33:45.890
This will go look through our conversation, which really isn't a conversation at this point, and summarize what it thinks are the most salient points. And you can see right here, this is the version of its compaction. So it's created the session summary.

00:33:46.290 --> 00:33:50.290
It's gone through and walked through the different tiers of requests,

00:33:50.290 --> 00:33:51.490
the actions conducted,

00:33:51.945 --> 00:33:58.585
the response, which honestly is kinda useless. I don't need an understanding of this. We need to carry over the actual context.

00:33:59.225 --> 00:34:01.625
So we have some minimal information,

00:34:01.705 --> 00:34:09.610
but you could see it's really devoid of a lot of the details that we'd need to truly carry this conversation forward. So if we go back into

00:34:09.930 --> 00:34:14.570
this diagram right here, you don't know what's happening at every point of compaction.

00:34:14.570 --> 00:34:25.055
Especially if you're five or six compactions deep, you can have a part of the old conversation, then a part of the recent conversation, and then a part of the results of some tools that were executed.

00:34:25.215 --> 00:34:36.655
But you as the user, especially if you're nontechnical, are less likely to really audit what's happening. So you want to design your sessions so that you have the lowest likelihood that you run into overly compacting your conversation,

00:34:37.140 --> 00:35:10.660
which is why spinning up multiple terminals can make sense, and I even record a whole video on how I like to create and spin up different terminals depending on the mutually exclusive tasks that I can identify, which is why I made a whole video on how I like to use different terminals for different tasks if I can make it so that they're mutually exclusive. So I'm gonna show that up on screen here. You can click on that if you wanna learn more. But the TLDR is one you can always use compaction, but one thing I really like to do is I like to ask Claude to give me a plan or write a plan for itself before I compact on everything that should be summarized from the conversation,

00:35:10.820 --> 00:35:16.020
all the most salient points, and tell it exactly what to worry about. So then when I do slash

00:35:16.020 --> 00:35:28.745
compact, it will take that most recent summary more heavily into account. It'll weight it more when it creates its own compaction. But ideally, I can avoid it as much as possible. And if you're running into issues where Claude forgets nonstop,

00:35:28.745 --> 00:35:30.985
then in session one, let's say you discussed x.

00:35:31.390 --> 00:35:43.310
And in that session, Claude knows everything. And then when you compact it, especially multiple times, that context is gone. So when you go to session number two and you have a fresh start, Claude basically has amnesia.

00:35:43.310 --> 00:35:49.755
Well, again, if it's something that needs to be understood and it's something that you can compress in a couple sentences of understanding,

00:35:49.835 --> 00:35:56.155
this is where it makes sense to update your CloudMD because this will be your persistent comprehensive command center

00:35:56.235 --> 00:36:15.565
brain that will persist across sessions, which is why you wanna be so careful when it comes to taking care of your CloudMD because it takes the center of the beginning of your session where, again, it's the most suggestible and the most helpful and at the same time might help you from having to overly compact over and over again. Now when it comes to session management,

00:36:15.645 --> 00:36:29.420
this is essentially where you open multiple terminals at the same time. Behind the scenes, you can see it all loading up. And what's happening while it loads up is it starts and it runs clogged behind the scenes, and then this is where the conversation would happen.

00:36:29.660 --> 00:36:36.940
And then at the end, you either quit or close it. But the cool thing is is that even when you close it, it persists during conversation.

00:36:37.260 --> 00:36:57.230
So most people don't know that all the conversations that you have with Claude code in your terminal are actually stored in the root folder of dot Claude. And all you have to do to retrieve them is ask Claude to go to its original folder and pull a markdown file of all your last conversations. So if you need to, you can ask Claude to search and traverse through different conversations

00:36:57.310 --> 00:37:03.950
to pull any gold nuggets where you wanna persist that in your memory or your Claude MD file. So when you finish a particular session,

00:37:04.455 --> 00:37:09.175
you lose the context window at that point in time. You lose the file snapshots.

00:37:09.335 --> 00:37:17.575
You lose any in memory states, meaning if it's read something, let's say that PDF I showed you on screen not too long ago, it will forget that existence

00:37:17.575 --> 00:37:24.260
of that file. And it's because it's called stateless sessions. And what stateless means is that outside of the CloudMD

00:37:24.260 --> 00:37:35.155
and your system prompt behind the scenes, every single session is a blank slate of context. And what persists are the files you've created, like I said, the conversations, if you dig for them, your CloudMD,

00:37:35.155 --> 00:37:37.635
any Git commits, and installed packages.

00:37:37.715 --> 00:37:46.035
And one thing I wanna mention on Git commits is once you get more at ease with GitHub in general, you can create what are called GitHub

00:37:46.035 --> 00:38:13.485
issues. And a GitHub issue is essentially a series of to do list tasks that can live in GitHub, meaning it lives in the cloud, so you can always refer to it at the beginning of a session as well. So if you don't want to bloat your CloudMD file, but you're looking for a way to have a to do list that's also not a markdown file in your Cloud Code session, you can level up to using GitHub issues to act as that to do list task. Now when it comes to editing your files during the session, it takes what are called snapshots.

00:38:13.650 --> 00:38:15.330
And the whole point of the snapshot

00:38:15.410 --> 00:38:25.090
is it knows exactly what the file looked like before. So let's say you change a series of things, and these series of things is not deleting your whole drive. You're changing a file,

00:38:25.570 --> 00:38:30.245
and it changes it in the wrong way. If you tell it to go back, it has enough context,

00:38:30.245 --> 00:38:38.725
and it can retrieve that in memory snapshot to go back to the original factory default settings of that particular file. So to break this down more tactically,

00:38:38.725 --> 00:38:45.640
before it even edits the file, it looks at the current file state, what it looks like at the moment, then it takes a snapshot,

00:38:45.720 --> 00:39:05.965
then that snapshot is saved in memory, and then Claude edits the file. And if something goes wrong, it goes back to restore the original snapshot from here. And then assuming everything is good to go, it updates the new file state. So does these temporary file saves so that you don't have to? Now like I said at the beginning, Claude code was designed to be hackable,

00:39:05.965 --> 00:39:12.960
and the five core things that you can use and customize to make your own, like we said at length, are the Claude MD,

00:39:13.120 --> 00:39:21.680
but also things like skills. If you're less familiar with what skills are, they're essentially a series of metadata where it's a long form explanation

00:39:21.885 --> 00:39:26.125
to Claude on how to use and invoke different Python functions.

00:39:26.285 --> 00:39:28.125
So it's not fully deterministic,

00:39:28.125 --> 00:39:46.260
meaning it's not fully predictable and it won't work the same time every single time, but it's definitely a lot more reliable to execute a small workflow in a particular way. So you can think of skills as mini any then workflows or make.com or Zapier workflows where it's not a 100% predictable,

00:39:46.500 --> 00:39:59.745
but for the most part, you know exactly the entire line of reasoning and the path that input a will take to get to output b. And another reason why I like them is they're injected just in time, and just in time means

00:40:00.145 --> 00:40:10.500
MCPs usually, when you load a brand new session, they're auto injected in your context. So if you have a huge MCP server or an MCP server with tons of micro tools,

00:40:10.660 --> 00:40:17.380
you can start off at a 50% context window with you doing nothing, which absolutely sucks from a user experience.

00:40:17.700 --> 00:40:36.315
Skills are only invoked when Claude feels that they're needed. So as long as you very well delineate when those skills are needed, then you should be good to go. Now with MCP servers, they used to be the absolute hottest thing last year until people started poking holes in the fact that, one, many MCP servers are built once and never maintained,

00:40:36.820 --> 00:40:39.700
or they have security issues, many of them have,

00:40:40.100 --> 00:41:06.380
and some of them need to have more malleable ways of ignoring certain tools versus others. Because let's say you wanna use one tool out of a 100 that loads the entire 100 all at once. So I used to use like 10 MCP servers and then over time, especially as they started bloating your context window, now I use one to three. On average, I'll use ones that make deployment of MVPs easy and creation of MVPs easy. So the Supabase MCP,

00:41:06.380 --> 00:41:07.660
Vercel MCP,

00:41:07.740 --> 00:41:25.785
or the Amazon Web Services MCP. Now when it comes to hooks, can think of them as mini automations that are tethered to different actions that Cloud Code takes. So if Cloud Code reads a file, you can tether a hook to that particular event. And anything we mentioned in passing, whether it's editing, whether it's writing,

00:41:26.025 --> 00:41:30.585
anything where there's a specific action or a tool call that's documented,

00:41:30.745 --> 00:41:36.980
you can attach some form of hook to. And hooks are useful because you can also change the way that cloud code behaves.

00:41:37.060 --> 00:42:00.965
You can literally have a hook that pops up on screen whenever your contacts window is passing a certain limit as a warning, and all you'd have to do is tell it to enable it in your terminal, make sure that your terminal can show notifications on your laptop, and that could be one example of infinite hooks that you can use. And when it comes to sub agents, this is something that I've recently gone over in a video that I'll show on screen now. And the TLDR

00:42:01.100 --> 00:42:12.540
is sub agents allow you to have digital pseudo employees that have their own prompt, that can have their own tools, that can work in tandem, in parallel, and most importantly, if you design it correctly,

00:42:12.700 --> 00:42:25.475
you can design it so that you don't step on each other's toes and don't have what's called agent collisions. An agent collisions is essentially when two different agents, let's say, a UI validation or UI improvement agent

00:42:25.475 --> 00:42:35.180
clashes with a back end agent because they both have to change the same file, the file that maybe powers both of the front end and the back end. So if you're nontechnical,

00:42:35.180 --> 00:42:46.715
you wanna avoid that as much as possible, which is why typically my gateway drug for you is to use sub agents for nontechnical tasks. Now CloudMD, I've already spoken about it at length, but in terms of specificity,

00:42:46.715 --> 00:42:52.235
you wanna think about this as your instruction manual, your project manual for your Cloud Code repo.

00:42:52.395 --> 00:42:54.395
Can you have multiple CloudMDs?

00:42:54.395 --> 00:42:58.670
Yes. You can have one per folder. If you have one repo full of folders,

00:42:58.910 --> 00:43:02.190
I personally like to create one folder, one repo,

00:43:02.270 --> 00:43:03.390
one CloudMD,

00:43:03.390 --> 00:43:23.685
and I separate them all out. It makes it easier for me. So I have a social media command center. I have a YouTube command center. I have a financial command center where I manage all the finances of my business and my agency and the community. So I like to isolate each task so I can really tailor and groom the CloudMD for that particular ecosystem.

00:43:23.685 --> 00:43:27.605
If you work in an environment where it makes sense to share a singular CloudMD,

00:43:27.605 --> 00:43:35.240
then what you wanna pay attention to is this. So you wanna be as hyper specific as possible. Your CloudMD should be as compressed

00:43:35.240 --> 00:43:52.855
and concise as possible. You don't wanna overdo examples. This isn't some huge prompt that you're throwing into, like, Gemini where you have a million contacts window. You wanna use this as your quick onboarding cheat sheet for each and every session. So one thing I like to do is let's say I create a playbook for how to do x.

00:43:53.015 --> 00:43:55.015
Instead of bloating my CloudMD

00:43:55.015 --> 00:44:03.410
by teaching it how to do x, I just tell it that whenever I, let's say, wanna write a LinkedIn post, I want you to go look at the LinkedIn playbook

00:44:03.410 --> 00:44:22.925
markdown file in this folder. So I'll create that one line in the CloudMD. So I use it as a routing source where if it needs to look at this, it knows where it needs to onboard itself more in detail on that particular area. And the other thing you can do is freestyle commands, and freestyle commands is saying, you know what? Every single time I say reverse,

00:44:23.350 --> 00:44:25.430
go through our entire conversation

00:44:25.590 --> 00:44:27.270
and update your CloudMD

00:44:27.270 --> 00:44:45.105
to learn from anything that might have changed in terms of the way we structure x, y, and z. So you can change its behavior. You can have trigger words. You can do whatever you want, which is why CloudMD is a blessing and a curse, and it's only a curse if you don't know what you're doing. Now some people ask me, should I do a global CloudMD across all projects?

00:44:45.425 --> 00:44:48.945
What I would say to that is if you're just starting out or you're even intermediate,

00:44:49.105 --> 00:44:59.010
if you have CloudMDs that are pretty different across all your projects, I would stick to one CloudMD per project. But if you find a way to unify everything,

00:44:59.170 --> 00:45:02.290
then it can make sense to have a universal CloudMD.

00:45:02.290 --> 00:45:08.525
So maybe you have playbooks where some of the values of those playbooks deserve to graduate to a global CloudMD.

00:45:08.765 --> 00:45:16.205
Me, personally, I'm very careful with anything global because you never know when you forget it, and then you're starting a project. And for whatever reason,

00:45:16.605 --> 00:45:33.445
something isn't behaving as expected, and you end up troubleshooting for hours. You could tell I've been through this before, and it turns out that it's just a global CloudMD setting that clashes with one of your prompts. So I know there are many opinions on how to structure a CloudMD, and what I'll say is there's no predefined

00:45:33.445 --> 00:45:42.645
absolute right way, but there are definitely wrong ways. And the wrong ways is, again, to overload it to have a twenty, thirty, 40,000 token

00:45:42.645 --> 00:45:43.605
CloudMD

00:45:43.605 --> 00:46:09.025
that's loaded each and every session. So skills, like I said, are on demand workflows, and these workflows can do anything from creating PDFs to creating docx files to reviewing code to committing code. It can be whatever you want. And the beauty of this is you can have technically a 100 skills, but because they're only injected when they're needed, as long as Claude knows when and where to use those skills, then you're good to go. And it helps you save on space,

00:46:09.265 --> 00:46:28.600
and one thing that I'm finding is you can convert a lot of MCP servers that can bloat your contacts window into skills. And you'll notice some companies are moving from maintaining their MCPs and updating them to creating skills that are way more powerful at using their API. Because an MCP server, especially for those of you that are nontechnical,

00:46:28.760 --> 00:46:30.040
is a layer of abstraction.

00:46:30.395 --> 00:46:33.435
And what that means is it's an extra unnecessary

00:46:33.435 --> 00:46:37.915
layer on top of what's called an API, an application programming interface,

00:46:38.075 --> 00:46:51.290
which means a way that you can interact with the back end of any service. And because it's not needed and it was a layer that was thrown on top of the AI stack to make it easier for people to connect agents to different services and have two way communication,

00:46:51.290 --> 00:46:55.770
we don't technically need it, but skills basically capture the functionality

00:46:55.850 --> 00:46:57.370
of the API service

00:46:57.450 --> 00:47:16.100
with some metadata explaining how to use it and when to use it. So in a way, if you're not always using the same MCP server each and every time, you don't need it to work all the time. You can also have it on demand as a skill. Now in the future, I see that skills will roll up naturally into agents. I think they'll have swarms

00:47:16.100 --> 00:47:26.740
where they come prebuilt with a series of skills. Now that they're all out there, I can imagine a world where you not only have a predefined marketplace where agents can shop skills on demand,

00:47:26.900 --> 00:47:29.140
and they are skills that you might not have to create yourself,

00:47:29.300 --> 00:47:41.465
but they will maybe learn a skill you teach it over time. So this concept, I expect to evolve, and I'm excited for it. Now one additional note I wanted to make on hooks is you can make hooks fire before

00:47:41.545 --> 00:47:50.650
or after a tool runs. So before we write something, you can make a hook occur with that predefined mini automated workflow,

00:47:50.810 --> 00:48:16.670
or after something happens, you can have it commit and do some form of action additionally. So hooks can happen before and after. One thing I would say as a point of warning is make sure that once you implement a hook, you decide whether it should be on the project level or the global level. So let's say you wanna experiment with something. The other day, I wanted to build something with hooks where I could stop Claude from making me compact conversations,

00:48:16.670 --> 00:48:28.175
and I wanted it to become self aware where it could count the number of tokens being exerted through the session and auto compact without me actually physically doing it myself. So I went down a rabbit hole for hours,

00:48:28.495 --> 00:48:34.975
but I made it project based. I eventually broke Cloud Code. I've stopped Cloud Code being able to create

00:48:34.975 --> 00:48:39.615
the next thought or the next action by adding so many micronuances

00:48:39.615 --> 00:48:57.725
to the hooks that I was thankful that I did it at the project level. Because had I done it at the global level, I'd be pretty annoyed and I have to find some way to undo it. The good thing with a project level hook is if it's not working, if it breaks things in that project, I can just blow away that project. It will blow away any associated

00:48:57.725 --> 00:49:12.580
local settings associated with it, and then we're good to go. The last part for this section is a double take on sub agents. So I see these as increasingly more important in the future as context window explodes. So once we have a 5,000,000,

00:49:12.580 --> 00:49:14.180
10,000,000 context windows,

00:49:14.340 --> 00:49:26.405
this will be infinitely more powerful, especially as you add skills to each one of them because you don't just have to attach an MCP server. You could have a set of skills that that each sub agent is correlated or associated with.

00:49:26.645 --> 00:50:01.085
Once you have that, it's beautiful because you go and you set off an agent, let's say an explore code based agent. You can explore three or four agents at the same time and say, you go look at the front end, you go look at the back end, and you look at any overlap between them. Then you have preserved your contacts window. You've used their focus to focus on one core task of that part of the code base, and they bring back the TLDR to the main agent or the main session. And it's beautiful because you can continue this flow state and this lucid train of thought while you bifurcate

00:50:01.085 --> 00:50:02.365
and you delegate

00:50:02.365 --> 00:50:19.900
all of the additional pieces that you don't want to initially have to create a brand new terminal session and a brand new blank slate with. So one last concept that's important to understand are extensions, and extensions might vary depending on whether you're a solo DOLI user, you're part of an agency or a company, you're an individual contributor in a larger organization.

00:50:20.345 --> 00:50:34.105
So if you are an org, then you might have company wide policies on permissions. Maybe you don't let any of your developers go on bypass permissions mode. Maybe you have a very specific set of settings because you can share settings dot JSON

00:50:34.185 --> 00:50:44.990
with all of your team, and settings dot JSON would have a list of different bash commands that are white listed. So you could live in a world where you don't have to live in either approval

00:50:44.990 --> 00:51:06.890
with edits or bypass permissions. You can have a very specific set that you make uniform across the board. As a middle user, you might have some personal settings, and then if you are really isolating a project where you wanna be able to do whatever you want, you can make all those rules and permissions specific to that project again because you'll have something called a settings local dot JSON,

00:51:07.050 --> 00:51:21.445
which is different from the global settings dot JSON that would make permissions global across every single project that you pursue subsequently. And in terms of how this flows, so you would make some form of tool request either in passing or through your prompt implicitly,

00:51:21.605 --> 00:51:54.285
and it will then check the allow list. If it's allowed, then it's green. You're good to go. It executes it. If it's unsure or if it's never happened before, that's why if you get started with Cloud Code and you go into the standard mode, it'll ask you permission a million times before you finally get to the point where you're doing something. Because it's trying to ask, okay. I'm trying to run this bash command. Now I'm trying to do this bash command. Now I'm trying to install this library. So the first time it sees it, it has to get your permission for each and every micro action, which is better to have by default than not.

00:51:54.685 --> 00:51:56.525
Now if you say no,

00:51:56.765 --> 00:52:05.060
then it also updates the settings local dot JSON to say, looks like Mark is never okay with doing x. So moving forward,

00:52:05.300 --> 00:52:34.520
every single time that this is about to happen, we either change course or we ask if we can enable it or tell him that we can't do this thing unless you whitelist this action. So examples where allowed, blocked, and asked would make sense is allowed would be auto execution, which is giving constant permission to Claude to be able to read whatever file you want. Now in terms of blocked, it makes a lot of sense, especially if you're doing something more on the organization scale. They deny things like deleting system files, especially if that's a part of a workflow.

00:52:34.680 --> 00:52:37.160
This will make sure that you would have to manually

00:52:37.160 --> 00:52:54.665
go and override this to delete a file, or you would go delete the file yourself as the user slash developer. And when it comes to ask, it makes a lot of sense to keep ask whenever you work in an organization where security is first and foremost, and you have something live in production with actual customers and users.

00:52:54.825 --> 00:52:59.940
Because if you install a random library that cloud code thinks is amazing as of 2024

00:52:59.940 --> 00:53:01.940
or wherever its last training was.

00:53:02.260 --> 00:53:03.700
But for whatever reason,

00:53:04.020 --> 00:53:06.580
that library or that NPM install

00:53:06.580 --> 00:53:12.260
is written with injections or SQL injections or bugs or whatever, viruses for all intents and purposes,

00:53:12.955 --> 00:53:17.915
wanna be able to always intervene and see what is getting delivered into your ecosystem.

00:53:18.075 --> 00:53:32.570
So the yellow ask here is helpful in case you wanna be able to micromanage and rightfully so each and every package that's entering the rest of your stack. So TLDR of the TLDR is reading and editing code files usually makes sense to keep that on YOLO.

00:53:32.730 --> 00:53:34.810
Deleting files makes sense to block,

00:53:35.130 --> 00:53:49.625
and installing things as well as pushing things to production directly without putting it in some form of a branch or a pull request. If you don't know GitHub, then close your ears for what I just said. And then when it comes to removing things entirely from your system,

00:53:49.945 --> 00:53:54.905
from files, also usually too dangerous, especially if you, yourself, your organization,

00:53:55.330 --> 00:54:02.050
or your team is just getting started with Code. And if you made it this far, you deserve to pat yourself on the back. This is a big accomplishment.

00:54:02.130 --> 00:54:13.455
You understand more than 99% of people about Cloud Code even as a nontechnical person. So before I leave you and I give you some resources as post homework reading to get this really synthesized

00:54:13.455 --> 00:54:37.505
in your brain, let's just execute one basic action. And now that we can identify everything, it's nice and a beautiful thing to just watch Cloud Code and identify and understand each and everything that's happening. So if you remember our original conversation, all I wanted to do was read this PDF file without nuking my entire context window. So let's open a terminal, and I'm doing this versus the extension

00:54:37.665 --> 00:54:39.745
just so that we see more verbosity,

00:54:39.745 --> 00:54:53.650
some more breakdown of what's happening. And let's just ask it to create a Python script that uses a very cheap language model, let's say, Gemini 2.5 Flash that has a million context window, and it reads the entire file, loads it into context,

00:54:53.650 --> 00:54:55.170
and gives us a TLDR.

00:54:55.170 --> 00:55:00.210
That's all we wanna do. So let's go, and I'll just go to Gemini

00:55:00.805 --> 00:55:02.005
2.5

00:55:02.005 --> 00:55:04.485
flash API documentation.

00:55:05.285 --> 00:55:08.725
This should pull that up. Let me zoom into

00:55:08.885 --> 00:55:20.130
the right page, then we'll come back. So I've located the right model on the right part of the website, and I'll take this link right here. We'll go into Claude, and I'll say, go read this page and understand

00:55:20.690 --> 00:55:22.370
how to implement

00:55:22.530 --> 00:55:24.210
and use this model.

00:55:24.370 --> 00:55:30.130
So I'll just give it the URL. It can go and search this, and it'll come back with a TLDR

00:55:30.795 --> 00:56:02.715
understanding of how to use it. So now that we better understand how Cloud Code works, you'll see first it executed a fetch tool to go and look at the website. It realized that it wanted to go a little bit deeper, so it fetched another part. It basically pulled on the thread. It got the model overview over here. It told us exactly how it's going to work, and now we can tell it exactly what needs to happen for us to use it. So let's say the following. Yeah. So I wanna be able to create maybe, like, a function we can invoke, maybe even a skill if it makes sense. Let's call it read large doc.

00:56:03.195 --> 00:56:09.595
And then what happens is when we invoke this skill, I want you to use the Gemini 2.5 flash API.

00:56:09.835 --> 00:56:27.220
Obviously, create some form of environment file, a dot n file, make it available in the main folder so I can put my Gemini API key. And anytime I tell you that I have a large document, I invoke this skill, you use this to take the entire document in memory. So make sure that we have available context window,

00:56:27.625 --> 00:56:31.225
especially input tokens that are around 500,000

00:56:31.225 --> 00:56:32.585
or 400,000

00:56:32.585 --> 00:56:39.225
large so we can actually give it a pretty large file. And the goal is the system prompt of the request

00:56:39.305 --> 00:56:46.950
to this Gemini API should be to summarize and synthesize this file so we don't have to ram this in your context window.

00:56:47.430 --> 00:56:49.270
So a bit of a

00:56:49.510 --> 00:56:50.310
mouthful,

00:56:50.630 --> 00:57:14.750
but this should now start its thinking process to come up with all the microparts that needs to do this. So, again, let's see here. So now it searches for a pattern. It's looking for the root Claude folder. You can see here we have some wild cards, and then it's looking for a specific pattern for environment. It's looking to see does an environment file to put your API key already exist. Now it's reading the Cloud skills. It's reading 253

00:57:14.750 --> 00:57:24.270
lines. So, you know, as you're stacking the lines, you're stacking your context window, that bucket is filling up. It's now using the right tools to create a new file,

00:57:24.885 --> 00:57:26.965
then create the Python file,

00:57:27.285 --> 00:57:31.525
then create a skill associated with it called read large doc,

00:57:31.685 --> 00:57:38.645
and that skill should be located in the skills folder right here. Read large doc. If you take a peek,

00:57:38.885 --> 00:57:40.085
here's the document put together.

00:57:40.830 --> 00:57:45.070
Let me see. Does it include Gemini? There we go. Yeah. There we go.

00:57:45.310 --> 00:57:48.510
That should be good to go. If I go to the very bottom,

00:57:49.230 --> 00:57:50.910
it's using Bash now,

00:57:51.150 --> 00:58:01.055
which is taking control of our terminal to see if it can basically send a test request to Google, and it's setting up what's called a virtual environment. This allows it to execute requests,

00:58:01.855 --> 00:58:02.415
and

00:58:02.735 --> 00:58:05.055
it realizes it needs to update its software.

00:58:06.495 --> 00:58:20.230
There we go. So it's updating the skill dot m d file. I'm only doing the play by play here, not because you can't read or you can't understand the concept. But now that we understand this lens, we can better audit exactly what's happening, and more importantly,

00:58:20.390 --> 00:58:29.245
stop it in its tracks if we absolutely have to. Alright. So it says everything's set up, and here are the files that were created. So an environment file, a Python file, a git ignore,

00:58:29.485 --> 00:58:37.600
a virtual environment. We have a Gemini key that already pasted mine in, and now it even created a slash command.

00:58:37.680 --> 00:58:38.240
So

00:58:38.480 --> 00:58:42.160
let's take it for a spin. I'll open it maybe in a new terminal.

00:58:42.400 --> 00:58:43.360
Let's do that.

00:58:43.760 --> 00:58:47.680
I'll spin this up, and hopefully, it should work. So let's do slash,

00:58:48.395 --> 00:58:53.755
and I already forgot what I named the skill. That's a skill issue, pun intended.

00:58:53.835 --> 00:58:55.275
We'll do read

00:58:55.435 --> 00:58:56.235
large.

00:58:56.555 --> 00:58:59.435
Okay. There it is. Read large doc.

00:58:59.515 --> 00:59:02.795
Now I'm not sure if the skill is smart enough to ask me for where the doc is.

00:59:03.970 --> 00:59:06.050
It's checking the environment file.

00:59:06.530 --> 00:59:11.090
I think it's just onboarding itself. Okay. Cool. Now if I say use this

00:59:11.090 --> 00:59:11.650
file,

00:59:12.210 --> 00:59:14.610
make sure you don't read it,

00:59:14.930 --> 00:59:16.770
but make the skill

00:59:16.770 --> 00:59:20.725
read it. Now I probably would have done this anyway, but I'm just being careful

00:59:21.045 --> 00:59:22.325
so I don't nuke

00:59:22.405 --> 00:59:23.365
my session.

00:59:24.085 --> 00:59:25.285
So hopefully,

00:59:25.285 --> 00:59:28.485
the next action should be to read said skill.

00:59:29.925 --> 00:59:31.285
Okay. It's using bash

00:59:31.900 --> 00:59:43.740
to execute the function right away. If I saw read, I would actually should be worried because they would read the file itself, and then we'll come back to the result. Well, well, well. So it's so large that it's 1.8

00:59:43.740 --> 01:00:05.910
tokens. So what I'll do is I'm gonna ask it. Cool. Can we add to this skill the ability to take a PDF, break it down to a markdown file, and remove all the unnecessary characters that exist in a PDF so it's hopefully less than this many tokens? We'll just see if this works. Alright. And that did the trick. We were able to adjust the skill. I'll show you the steps that it took to get here.

01:00:06.230 --> 01:00:11.750
You'll see it did a lot of changes. So first reads the Python file to see what's missing,

01:00:11.990 --> 01:00:18.815
then it executes bash to see whether or not any of these libraries exist, any of these packages are installed.

01:00:18.975 --> 01:00:21.855
It realizes none of them are installed in this project,

01:00:22.175 --> 01:00:28.230
so it installs them. And the point of those is to break down a PDF and convert it into what's called a markdown file.

01:00:28.710 --> 01:00:38.630
And then the reason why I love the terminal is I can at least audit, especially as a technical person, what's happening on the code side. If it's over removing things or over adding things,

01:00:38.870 --> 01:00:44.605
then it updates a skill. It reads the current skill first, then it updates, uses the right tool,

01:00:45.005 --> 01:00:46.045
the current skill.

01:00:46.685 --> 01:00:56.925
And once we get to this point, it then tells me the Eureka moment. So the proper PDF extraction made a huge difference. The raw PDF was showing 1,800,000

01:00:56.925 --> 01:01:05.640
tokens because it was reading the binary PDF data. That's all the junk I showed you earlier in this video. But proper extraction gives us only 22,000

01:01:05.640 --> 01:01:08.680
tokens. So we went down by 98%

01:01:08.680 --> 01:01:18.805
just by making this one change, and now it's solidified in a skill. So the next time I have a huge PDF and I wanna bring it in, I'm not gonna make Claude code read 1,800,000

01:01:18.805 --> 01:01:21.205
tokens when it only has 200,000.

01:01:21.365 --> 01:01:23.045
I will use a skill,

01:01:23.205 --> 01:01:31.210
invoke that, keep my beautiful bucket in my context as is, and just bring in the summary which we have right here. So if we go to summary,

01:01:31.210 --> 01:02:41.105
uh, if I close this out, it walks through everything that Claude would really need to know about this file for us to actually do something with it. Alright. So with that small example, hopefully, shows you a glimpse of what Claude code could look like through your new lens that you now understand through the tools it uses, how it manages its context, and how it functions overall. And if you've been on the fence that you're not technical enough and you made it this far, trust me, you are better off than the majority of people, so this shouldn't stop you anymore from hopping in, getting your hands dirty, and building whatever you want in a terminal. Now if you want access to the diagrams I showed a full guide walking through everything I explained in plain English, I'll make that available to all of you in the second link in the description below. But if this video really unlock things for you and you wanna take things to the next level and upscale yourself, then I would strongly recommend you check out the first link in the description below for my early AI adopters community. I personally manage it every single day, and we have a beginner to intermediate brand new course coming out in the next couple weeks along with the existing systems that I've made available and the new ones we're coming out with soon. And on top of that, we hire all kinds of coaches whether it's for n eight n or Claude code or cybersecurity

01:02:41.105 --> 01:02:56.803
to help you go to the next level. And for the rest of you, I would truly appreciate if you could just leave a comment and a like on the video. If it was helpful, share it with someone who's learning Claude code. It would really help me, the video, and the channel, and stuff like this really takes hours to put together, so I genuinely appreciate it. I'll see you in the next one.
