WEBVTT

00:00:00.000 --> 00:00:07.120
Anthropic just released something huge, which is a brand new certification program called the Claude certified architect.

00:00:07.120 --> 00:00:34.540
It's a real exam, pass or fail, and one of the most important things is that it's based on five core domains. Now whether you're trying to get certified or you just want to get better at using Claude code and becoming a master of it, then this entire syllabus that they've put together will act as the best resource for you to learn exactly how to go from zero to hero. So here's the deal. I went through the entire exam guide myself, not through an LLM. I read through every single page. So I'm going to synthesize

00:00:34.540 --> 00:01:22.515
this entire exam guide into this video and break down each and every concept for you. And as a bonus, I'll share some resources that should supercharge your learning journey. So this is the very guide that I was discussing before. It is 40 pages and it walks through everything from the target candidate description of what it's like to be a master of Cloud Code in Anthropics. Each and every paradigm you should master and be prepared for. There are some exam content response types. There's some preamble around what the exam entails, what the format is, and this is the most important part, which is the content outline. And this is where it breaks down the distribution of each and every part. And beyond that, in each area, it walks through a series of scenarios, and you'll see that it's very specific. It's not just about knowing MCP tools, it's about understanding exactly how to conceptualize

00:01:22.515 --> 00:01:44.185
each and every part of said tool. If you scan through all of this, you'll see exactly how detailed it is and it can be overwhelming to many, which is why I wanna break down each and every concept and go through every domain with a fine tooth comb. So like I said, there are five different domains that Anthropic thinks that you need to know before you can call yourself a Claude code master. So the biggest one, which is 27%

00:01:44.185 --> 00:01:46.825
of the exam, is agent architecture.

00:01:47.065 --> 00:01:57.660
So this is basically how Claude thinks step by step, how it coordinates with other agents, and most importantly, how you enforce rules that just telling Claude in a prompt can't guarantee.

00:01:57.900 --> 00:02:03.900
If you only study one domain, this is the one. Now tool and MCP integration is at 18%.

00:02:04.295 --> 00:02:08.775
So this is how Claude connects to the outside world, your databases, your APIs,

00:02:08.935 --> 00:02:18.695
your file systems. And the number one reason why agents call the wrong tool is actually embarrassingly simple, and we'll get to that shortly. So Claude code configuration sits at 20%.

00:02:19.120 --> 00:02:25.920
This is your Claude MD files, your skills, your commands, and things like plan mode. And most people dump everything

00:02:25.920 --> 00:02:37.335
into one giant file, but the exam guide teaches you how to split that into three layers so Claude isn't loading irrelevant tools every single time you ask us something. Ultimately,

00:02:37.335 --> 00:02:39.495
we have the prompt engineering section,

00:02:39.655 --> 00:02:42.055
and this isn't just about writing better prompts.

00:02:42.135 --> 00:02:55.670
The certification is actually very specific here. If you want Claude to give you consistent output, show it two or three real examples of exactly what you want. That works better than writing a whole paragraph of detailed instructions

00:02:55.670 --> 00:02:59.510
every single time. And last but not least is context management and reliability.

00:02:59.945 --> 00:03:20.180
So here's the thing. Claude reads the beginning and the end of what you give it really well, but the stuff in the middle sometimes can get fuzzy, and that's called lost in the middle logic. So now that we've set the tone for what's important, let's go through every single one, and we'll naturally start with the largest and most important, which is the agentic architecture and orchestration.

00:03:20.420 --> 00:03:29.155
Let's start with the very engine that powers each and every Claude agent, which is the agentic loop. So whether you're using Claude code, the Anthropic SDK,

00:03:29.315 --> 00:03:36.755
or any agentic framework that is built on top of Claude, this is what's happening each and every single time you run an agentic workflow.

00:03:36.995 --> 00:03:42.200
So your code first sends a request to Claude, and then Claude naturally responds.

00:03:42.280 --> 00:03:46.200
The most important thing to keep aware of is this stop reason right here.

00:03:46.520 --> 00:04:19.560
You wanna check this all the time. If it says tool use, that means Claude wants to go use a tool, like reading a file or running a command. You can execute said tool, feeds the result back, and then it goes again into this endless loop. But in this case, it's not actually endless. It goes up until a very certain point. If it says something like end turn, then that's essentially Claude saying that I'm done. That's pretty much the entire engine over and over again. Now the exam guide points three different areas where people make mistakes in understanding the agentic loop, and these are basically the anti patterns. So first,

00:04:20.045 --> 00:04:24.525
reading Claude's text looking for phrases like I'm done or task complete.

00:04:24.765 --> 00:04:31.165
That's unreliable and it breaks all the time. Second, you don't wanna set limits like stopped after 10 loops.

00:04:31.405 --> 00:04:40.350
You don't know the level of depth that Claude code needs to do yet to accomplish a specific task. So you might be cutting off work that genuinely needs 11 steps.

00:04:40.670 --> 00:04:50.945
And third, you don't wanna look at what Claude said to figure out if it's finished. There's a very specific field, like I said, stop underscore reason that exists for exactly this purpose.

00:04:51.345 --> 00:04:55.665
It's the only thing that you should be checking. Now if you're using Cloud Code in a terminal,

00:04:55.745 --> 00:05:08.590
then sometimes you won't see this stop underscore reason. Every time Cloud Code reads a file, executes a tool, runs a command, or spins a sub agent, this is exactly the process and the patterns that drive that forward.

00:05:08.830 --> 00:05:33.415
Understanding this will really help conceptualize everything else we're about to cover. So when Claude needs to do complex, like research a topic from multiple angles or process a really large project, You don't need to send one agent to do everything. You basically have one agent that sits in the center, and this is the main agent. It breaks the task down, hands off pieces to specialized agents, and then combines the result at the very end. So these would be examples

00:05:33.830 --> 00:05:58.355
of the other sub agents. So one synthesis agent that uses tools to verify and write, another search agent that uses tools like search and fetch URL, and then you have the analysis agent that uses tools like read doc and extraction. Now the exam guide mentions this specific concept. So each one of these agents has its own separate window, its own separate set of tasks, and technically its own world. So there's no communication

00:05:58.435 --> 00:06:45.920
in between different sub agents. That's actually what the newer agent teams feature was designed to do, which is enable that communication through providing the equivalent of an email inbox to each agent so they can email each other, see who's blocking who, and execute the task in unison. So it's important to understand that sub agents don't maintain track of memories of other sub agents at the same time. So sub agent a will have no idea what sub agent b did. It will all kind of come together at the very end once the main agent takes the TLDR of each's outputs and or findings and then disseminates that back to you. Now there's one major mistake that many people make when it comes to understanding sub agents, and it's the following. So let's zoom in here. Even though you have a main agent, it's completely possible that the coordinator

00:06:46.155 --> 00:07:02.680
could break down tasks too narrowly, and this is something you have to look out for. So in practice, it could look like this. You say research AI in creative industries, and the coordinator only creates subtasks about visual art. So digital art, graphic design, or photography,

00:07:03.000 --> 00:07:05.800
but it completely misses music,

00:07:05.800 --> 00:07:10.440
writing, film, and game design. So the sub agents did their job perfectly,

00:07:10.680 --> 00:07:16.635
but the coordinator just scoped it wrong. It's basically like having a bad manager for a great team.

00:07:16.875 --> 00:07:19.515
So the fix is to give broad goals,

00:07:19.755 --> 00:07:50.835
not narrow checklists to this main coordinator agent. You wanna let the sub agents figure out how to break down all the sub tasks based on their narrow goals that are defined by this broader goal. So instead of me just breaking this down conceptually, let's go down into the terminal and see an example of this. So if we go into warp here, I'm going to copy paste this prompt and we'll send it over. And while we send it over, I'll read it through. So it says, I want you to research the impact of AI on content creation by spawning three sub agents in parallel.

00:07:50.995 --> 00:08:01.850
The first sub agent is to research how AI is changing video content creation. Creation. The second one is to research how AI is changing a written content creation. And the last one is how it's changing audio.

00:08:02.170 --> 00:08:10.225
And then we say each sub agent should search the web and return a three bullet summary. So it's very broad in terms of what they're looking for. We're not micromanaging

00:08:10.225 --> 00:08:17.345
exactly how they're going to do said thing, but we're giving them the overall assignment. So you've taken the equivalent of an employee,

00:08:17.505 --> 00:08:29.020
you've onboarded them, and then after onboarding them, you trust them enough to give them a very well situated task and allow them to execute independently. And after completes, you could see right here all three agents are finished.

00:08:29.260 --> 00:08:32.620
Each one use its own set of tools, its own set of tokens,

00:08:32.780 --> 00:08:38.675
and then we have the results from each one, and then we have the overall synthesis. So this is the main agent compiling

00:08:38.675 --> 00:08:54.220
all the results of the sub agents, and this is how this coordinator sub agent pattern works. So this might be the most important part of the entire exam guide where they differentiate between prompts and hooks and when and where to use each. So prompts are what I call best effort.

00:08:54.460 --> 00:09:02.715
You can tell Claude something like always verify customer before processing a refund, and most of the time it works, but sometimes it doesn't.

00:09:02.875 --> 00:09:14.555
If we even hop into their exact scenario, they have this question where they say production data shows that in 12% of use cases, your agent skips this invocation of a function get customer entirely

00:09:14.720 --> 00:09:22.720
and just go straight to look up an order based off of the stated name occasionally leading to misidentified accounts. So from a business perspective,

00:09:22.960 --> 00:09:24.960
this is not okay. 12%

00:09:24.960 --> 00:09:26.160
of accidentally

00:09:26.160 --> 00:09:46.930
provided refunds to the wrong person or to people trying to take advantage of this becomes a big issue. Now hooks are completely different. A hook is basically a small script that runs automatically before or after Claude tries to do something, and it can literally block Claude from taking an action unless a specific condition has been met. So it's not 99%.

00:09:46.930 --> 00:09:48.770
It's not 99.9%.

00:09:48.770 --> 00:09:56.770
It has to be a 100%. And the action physically can't happen if the hook says no. So you can think of prompts as suggestions

00:09:56.850 --> 00:09:58.210
and hooks as laws.

00:09:59.015 --> 00:10:09.655
So the exam guide draws a very clear line on when and where to use prompts versus hooks. And when it comes to what it's good for, it's primarily around style, tone, and formatting.

00:10:09.895 --> 00:10:19.270
These are things that you can execute well 90% of the time, and it won't land you necessarily in an area of harm or a land of hurt. Hooks are optimized for compliance,

00:10:19.270 --> 00:10:33.755
financial stuff, and security. So anywhere where one single point of failure can cause some real issues. And this overall concept is where a lot of people go wrong because they just think that if something failed 90% of the time, they can just tweak the prompt to perfection.

00:10:33.755 --> 00:10:44.960
And as running a company called prompt advisers where initially, all I would do for companies is help them prompt engineer every system prompt in a system, in a production use case for content creation,

00:10:45.120 --> 00:10:59.305
there is a level where a prompt is just not good enough over a thousand iterations or 5,000 iterations. If you're not as familiar with hooks and you want a little bit of a debrief, then you have two options. If you pop into a terminal, you could always do slash

00:10:59.545 --> 00:11:04.185
hooks, and then this will show you each and every way that you can invoke a different tool.

00:11:04.505 --> 00:11:22.055
And this list goes on and on and on. And if you click on one and you click enter, it'll show you exactly what it would do. And option two is, I showed this in a prior video, you can use my favorite function in Claude code, one of the most underrated, which is the Claude code guide agent, and then you can ask it what

00:11:22.295 --> 00:11:25.335
are the best hooks for x

00:11:25.335 --> 00:11:40.670
use case. And then it will go through with full knowledge of what hooks it has at its capacity and which one is optimized for your use case and whether or not you should be using a hook or a prompt to begin with. So whereas the last concept, the enforcement piece is probably the most important,

00:11:40.830 --> 00:12:07.340
this is the highest leverage, which is getting tool descriptions correctly, which is basically giving Claude code, which you provide it with whatever tools you want at its arsenal as well as its native tools, the right tool at the right time with the right description for the right use case. So tools are basically how Claude decides which tool to use when it has multiple options, And that's typically not a small feat because you can have two tools that have vague overlapping descriptions,

00:12:07.580 --> 00:12:09.740
like one that retrieves customer information

00:12:09.820 --> 00:12:12.860
and another that retrieves exactly what the order entails.

00:12:13.100 --> 00:12:17.825
So that could lead to some form of communication issues. So Claude has to essentially guess,

00:12:17.905 --> 00:12:20.305
and the exam guide covers this very clearly.

00:12:20.545 --> 00:12:23.825
Ambiguous descriptions cause frequent misrouting.

00:12:23.825 --> 00:12:33.150
So Claude ends up calling the wrong tool way more often than you'd expect. And one really important thing to note is that when you invoke these tools, sometimes

00:12:33.310 --> 00:12:36.030
you see the final result being executed properly.

00:12:36.190 --> 00:12:42.350
But back when everyone was using no code tools like n eight n, you would have the agent in that platform

00:12:42.510 --> 00:13:05.770
work and execute the workflow, and you would see it fire the right result. So you could get the exact result you're looking for in Cloud Code, but you have no idea that it actually did the wrong thing three or four times to eventually do the right thing. So it's not just about the outcome, but also the efficiency in getting to that outcome because as it tries through all the ways it doesn't work, it spends your tokens and you wanna be as token efficient as possible.

00:13:06.170 --> 00:13:10.490
So to give you something more tangible, let's say this is one of your functions, get customer,

00:13:10.650 --> 00:13:18.675
you would basically say that you wanna use this tool whenever you need customer ID and profile data, and you want to use the lookup order instead

00:13:18.755 --> 00:13:47.835
when you have an order number and need a shipping status. So you essentially want to be intentional in saying do not use this tool when this happens versus just saying when it should use that tool. And this is pretty much the highest leverage tip from the entire guide, which is the description of the tool is really the interface of tooling and fixing the descriptions to make sure it knows the optimal path, the critical path the most critical thing that you can do in your workflow. Now while MCP servers are increasingly falling out of favor for different use cases,

00:13:48.075 --> 00:14:05.710
there are times where they make sense. So the exam guide does cover where to use different levels of scope for your MSP servers. You have project level and you have user level, and I'll walk you through the difference of when and where to use both. And this essentially allows Claude to connect to external tools like GitHub,

00:14:05.790 --> 00:14:18.925
Slack, Outlook, whatever it is. It's one of the vectors that you can use. And if you've ever used Claude Chat or Claude Cowork and use their connectors feature, it's essentially using an MCP under the hood. So project level MCP

00:14:18.925 --> 00:14:22.430
lives in a file called the dot m c p dot JSON

00:14:22.510 --> 00:14:24.190
at the root of a project.

00:14:24.510 --> 00:14:28.910
So any passwords or API keys go in what are called environment variables,

00:14:29.150 --> 00:14:34.270
where they're denoted as dot n for dot environment, and they never directly end up in the core file.

00:14:34.935 --> 00:14:54.480
So for example, if you had an MCP server for GitHub, which is essentially code version history, you would have an environment variable. It would be set to the token of your GitHub, and this would be written to an environment file. So every single time that an agent would try to use an MCP server, it would then be auto authenticated through this file, then go and invoke this specific service.

00:14:54.960 --> 00:15:02.225
User level MCP lives in a file in your home directory. So this is basically your personal sandbox. You have experimental tools,

00:15:02.385 --> 00:15:12.705
personal API keys, things you're testing before rolling them out to the rest of the team. Now the practical takeaway from the exam guide is that essentially you can use as many community based MCPs.

00:15:12.950 --> 00:15:23.830
These are not necessarily open source MCP servers, sometimes aren't too safe, but more so the native MCP servers from the platforms themselves. So if you look at the major providers like Salesforce,

00:15:23.830 --> 00:15:28.645
GitHub, etcetera, everyone has some form of instruction for using MCP servers,

00:15:28.885 --> 00:15:49.720
and only build custom servers when you absolutely need to. And it's important to remember that an MCP server is purely a function. So if you just need your functions executed in a slightly different way or different order, you might not need a custom MCP. Now real quick on the terminal side of things, all you'd have to do is go into your terminal, and you could do one of these two things. You could say, Claude

00:15:50.040 --> 00:15:51.480
MCP list.

00:15:51.720 --> 00:15:54.760
This would go and invoke if you have any MCP servers whatsoever.

00:15:55.175 --> 00:16:02.615
Now personally, I've migrated my entire ecosystem to skills, CLIs, etcetera. So you won't find any that are already authenticated.

00:16:02.615 --> 00:16:40.180
You'll find just the shell of the ones that I used to use. So the Gmail, Google Calendar, Canva, and Zapier, all of them I used to use, but now I've migrated all of them to use the skills primarily just for token efficiency, security, etcetera. But if you wanted to see which ones you had out of the box, that's the way you do it. If you're using MCP servers at the the project level, then you could just paste the command just like this, where you could say, show me the dot MCP dot JSON file in this project and explain the MCP server configuration. And then you get this response where in this case, I don't have in this particular project an MCP dot JSON file, and it walks through what needs authentication like we saw before,

00:16:40.420 --> 00:17:20.940
how to configure it, and there's that command that I showed you before, Claude m c p list. It basically invoked that. So whether you're asking for it through natural language or going straight to the source with this command, then you can have full visibility on what's happening with your MSP servers. The next principle in the exam guide is the tool overload problem, and this is essentially making better decisions by having less options. So you can think of it like this. Giving an agent 18 tools is like hiring a brand new employee and giving them access to every single system from day one. They're gonna use things that they shouldn't call tools outside their lane. You wanna keep each agent down to a maximum of four to five tools that are directly relevant to what they're doing.

00:17:21.615 --> 00:17:43.740
That constraint is really what makes them precise. And if you need a reminder, earlier, I showed you an example of spinning up three sub agents, and you'll notice that all of them used four or five tools at max. So this is really a paradigm that's built into Cloud Code, and that allows it to have a process, create SOPs. So this one would be search, fetch, extract, and save. So the goal is being precise,

00:17:43.740 --> 00:17:45.820
reliable, and always on task.

00:17:46.380 --> 00:18:04.890
Now there's also a setting called tool choice that controls how Claude picks his tools. There are three main modes. One of them is auto where basically Claude decides on its own whether to use a tool or not. And then you have another one called any, and this is essentially forcing Claude to use a tool, but it has to pick which one.

00:18:05.530 --> 00:18:07.290
And finally, have forced,

00:18:07.290 --> 00:18:13.050
meaning we are making it. Use this tool and there are no options. It's not just independence,

00:18:13.130 --> 00:18:15.610
it is forced dependence on a particular outcome.

00:18:16.275 --> 00:19:11.910
So the guide alludes to the fact that you can force a tool call to make sure that step one is always consistent and predictable, and then you can loosen that proverbial leash of Cloud Code to run freely and make more autonomous decisions as long as you know you've steered it in the right direction. So you're essentially putting guardrails on its first move or couple moves and then allowing it to run freely and really tap into that power of the agentic harness. Now next up is one of the most contentious topics in Claude code, which are Claude MDs, which are the heart and soul of your operating system, your air traffic control of your repo or project if you will. And pretty much it covers all the different layers, three different layers, the user level, the project level, and the path specific rules. Now most people dump everything they know into Cloud MD, they think that it's a proxy for a knowledge base or rag, but essentially it's not. People dump their preferences, their rules, their style, their tone all in one place, and then complain

00:19:11.990 --> 00:19:34.830
why there's so many tokens being wasted all the time. The big issue is that every single time you open a brand new session, Claude auto injects that straight into memory. So you're wasting time and you're wasting tokens. So the guide splits it into three different layers. One is the user level, the next is the project level, and the last one are path specific rules. So you can treat your top layer as your personal preferences file.

00:19:34.990 --> 00:19:40.510
This lives in your core home directory. So you have your editor settings, how you like your explanations formatted.

00:19:40.510 --> 00:19:45.150
So this one's just for you and not meant to be shared with anyone or through something like GitHub.

00:19:45.725 --> 00:19:48.765
So the middle layer is a project level CloudMD,

00:19:48.765 --> 00:19:52.125
and this is where you have things like team rules, coding conventions,

00:19:52.205 --> 00:20:37.710
architecture decisions, and this essentially allows you to share it with your team assuming you have one so that everyone's on the exact same page. So this is where having some version control makes a lot of sense. And finally, we have the bottom layer here, and this is really the golden nugget of the three levels. These are path specific rules. So you create a small rule file that lives in the dot claud rules folder, and at the top of each file, you put a pattern that says when to load it. So when something like only load this when I'm editing files is a very good example. So your testing rules only show up when you're writing tests, and your API rules only show up when you're in the API folder. And lastly, if you have something like React components, if you're a developer, then you know what that is. If not, then don't worry about it. The TLDR

00:20:37.710 --> 00:21:01.320
is this is huge because Cloud Code can get focused. So you can have a lean and mean Claude MD and rely on rules to cure the path forward for any nuances that need to be taken account for a specific use case. So I know I'm throwing a lot at you right now, but the next section tries to bring everything together into cohesion. So it's really about when to use what because we haven't even started speaking about things like skills,

00:21:01.320 --> 00:21:02.200
like commands,

00:21:02.200 --> 00:21:04.280
plan mode versus direct execution,

00:21:04.520 --> 00:21:31.140
when to use each. So commands are basically reusable prompts. You save them once and you can trigger them with a slash command. So you can have slash review PR slash generate tests slash morning if you wanna execute a walk through of what your day looks like based on your calendar, your Gmail, anything you've hooked up maybe using the Google CLI. But one thing to note is that team wide commands go in a commands folder in your project so everyone can use them via something like git,

00:21:31.380 --> 00:21:48.205
whereas personal ones will end in your root folder, and these are your personal flash commands. So these are specific to you and tailored to exactly what you wanna do day to day. Now we've gone through skills at length in this channel, but just in case you and I are meeting each other for the first time, we'll go through that as well. So skill is a step above a command.

00:21:48.445 --> 00:22:05.200
A skill has its own file that defines what it can do, what tools it's allowed to use, and it runs in its own separate context. So you can think of it like this. A skill can do messy exploratory work like research files, do pretty much anything you want, and none of that clutter ends up in your main conversation.

00:22:05.625 --> 00:22:37.835
It's like sending someone to go do research in another room, and you're just bringing the summary back to main conversation. Now moving on to another existential question that many people ask and the guide goes through, to use or not to use plan mode. So if the task touches multiple files, it's ambiguous, or it could go in a few different directions, then using plan mode is the way to go. Claude explores, reads, and proposes changes without actually modifying anything. Just review it, approve it, or tell it to go in a different direction. But if it's a very obvious and straightforward

00:22:37.995 --> 00:22:45.995
single file fix, then you can just let Claude execute it directly. So you don't have to over plan in this case the same way many people will over engineer things.

00:22:46.475 --> 00:22:49.835
Now this next part is fairly advanced. So if you're nontechnical,

00:22:50.150 --> 00:22:57.910
this part might leave you squinting a little bit, but I'll try to explain it as best I can. So this is about using Claude code in what's called a CICD

00:22:57.910 --> 00:23:01.030
pipeline. What this stands for is continuous integration

00:23:01.110 --> 00:23:09.205
and continuous development. So if were to break down this concept into one sentence to make it as accessible as possible to everybody, it would be the CICD

00:23:09.205 --> 00:23:29.660
pipeline is an automated conveyor belt where a developer will push code, that code will be reviewed, and then it will be shipped and pushed to the end user, all without any form of buttons being pressed along the way. So the guide really focuses on this step three right here, but we'll get to that in a second. Step one is, like I said, you have a developer that pushes some code.

00:23:29.900 --> 00:23:33.340
Then this triggers the CI, the continuous integration

00:23:33.340 --> 00:23:33.820
pipeline

00:23:34.205 --> 00:23:52.640
to go and check it. Step three is where the magic happens, and this is what's called Claude dash p. Claude dash p is not a very straightforward concept. So, again, I'll try to break it down. The dash p essentially allows Claude code to run without asking you for anything. So no prompts, no confirmation.

00:23:52.880 --> 00:23:54.240
It's essentially bypassed

00:23:54.240 --> 00:24:13.655
permissions mode in a way, and it just runs the task you give and it gives you the result back. And then you have a flag that gives you a clean structured output that other tools can read. It's actually called the dash dash output format JSON flag. When you put these together, these two flags turn Claude code from something you can chat with solely

00:24:13.655 --> 00:24:21.540
into something that you can use to automate different parts your process. So the main learning here is that you can trigger this from any CICD pipeline,

00:24:21.620 --> 00:25:19.180
any system that essentially tests and deploys your code. Now it's hard to make that part less gibberishy, but this part will be the main takeaway from that section. And this is their important note on using separate clawed code sessions for reviewing code versus writing code because there is some level of pollution. When you write code, you essentially are biasing the language model to say, yeah. Yeah. I wrote amazing code. Because why would the language model write poor code on purpose? So you need a stateless session to go and review any form of code, anything that was produced in session one assuming you're doing something more on the technical end of things. So if you need a little anchor to remember, then you can remember that fresh eyes, even AI eyes can catch more. Two heads are better than one. In Claude code's case, five, ten, 15 heads sometimes are better than one at reviewing code as long as it's in a separate session. So for example, if I said claw dash p list all Python files in this project and summarize what each one does, output this format,

00:25:19.260 --> 00:25:46.780
then you will see it goes through every single Python function in my folder, which I won't get into in-depth, and then it comes back with the full key patterns here. All scripts use Gemini three pro preview for images. This is my thumbnails generation folder and dated output folders report lab for PDFs and one script per video topic design. So when it comes to making outputs reliable, this is a whole portion of the guide that's dedicated to dealing with inconsistency

00:25:46.780 --> 00:25:56.995
in Claude's responses. So your instinct when Claude gives you inconsistent outputs is to write more instructions. So for whatever reason, your instructions involve number crunching,

00:25:56.995 --> 00:25:58.915
something like handling different currencies,

00:25:59.075 --> 00:26:00.435
different decimal places,

00:26:00.515 --> 00:26:05.315
you try to shove all of that in there. But Claude interprets it differently each time.

00:26:06.040 --> 00:26:14.440
One response can give you one number versus another depending on the day, the model you choose, etcetera. So you can have the same set of instructions,

00:26:14.680 --> 00:26:18.760
but three different results because people keep forgetting that this isn't magic,

00:26:18.840 --> 00:26:29.135
these are language models. Now to fix this, Anthropic recommends going to few shot examples. If you're not familiar with what few shot are, these are from the beginning of prompt engineering time,

00:26:29.375 --> 00:26:37.460
one of the best ways to get consistent outputs. So you give an example. In this case, the input could be Acme Corp reported 4,200,000

00:26:37.460 --> 00:26:39.220
in revenue for 2024,

00:26:39.460 --> 00:26:51.935
and this is the output you want exactly. So you give it exactly the parameters. In this case, we're putting it in JSON. This could be in whatever format you want, and same thing for example two. So multi shot gives it enough of a hint to generalize

00:26:51.935 --> 00:27:13.950
and better understand which direction you're going for. And the interesting thing here is that Cloth doesn't just copy paste your examples. It learns the underlying patterns behind them. That's why two to three examples will beat a full page of instructions each and every single time. Now in the same vein of consistent and reliable outputs, this also generalizes to JSON, which stands for JavaScript Object Notation.

00:27:14.190 --> 00:27:35.090
Very common when you're dealing with agents, with toolings, and tool calls. So this is also more of an intermediate to advanced use case, but it's important to know because it's covered in the guide. So I'll move from left to right and, again, try to make this as accessible as possible. So step one is you define a tool, which basically acts as your template, providing the exact structure you need. So every field,

00:27:35.490 --> 00:27:36.690
every data type,

00:27:36.930 --> 00:27:38.370
and whether or not it's required.

00:27:38.850 --> 00:27:49.035
Leaving something as optional is beneficial for Claude because otherwise if you don't tell that it's optional, then Claude will make it up. So making it optional allows Claude to say,

00:27:49.275 --> 00:28:34.540
I don't know in a very legal way. So you're a legalized way of allowing it to say, I don't have this. I don't know what to do with this. In step two, building on what we referred to before, you can force Claude to use a specific tool. So there's no option to respond with plain text, no option to use a different tool. It has to fill your template as is. So just as a takeaway, this eliminates syntax errors. So anything like malformed JSON, this is just broken JSON or markdown wrapping, but it does not eliminate semantic errors. So anything with a wrong value in set field. So step one is you extract the data, and assuming it's correct then obviously this is all done. But if it's not correct, this is the part where you really need to dial in. You're not meant to just say try again. You're meant to actually send very specific feedback.

00:28:34.780 --> 00:28:39.420
So instead of saying retry, you would say the original document, the field extraction,

00:28:39.580 --> 00:28:45.455
and the specific error. And this is how you would frame the specific error. Revenue field says $0,

00:28:45.455 --> 00:28:48.015
but document clearly says 4,200,000.

00:28:48.095 --> 00:29:01.550
So now you're giving it multiple areas to zero in on and see what might be happening. But just like everything, there's nuance. You don't wanna just keep going in this endless loop. If the answer isn't there, if the information isn't in the source document,

00:29:01.710 --> 00:29:11.355
then retrying even with the best of instructions won't help. So it's not just knowing how to validate and test, but also knowing when to stop. Now as a segue to the next section,

00:29:11.595 --> 00:29:29.410
back when I used to drive a Honda, once in a while you'd get this notification which politely asks you to take a break and typically it's because they want you to have attention to the wheel especially for longer drives. Drives. You can take this exact same paradigm in mind for the next principle. So instead of worrying about driving sharply,

00:29:29.490 --> 00:29:49.785
this focuses on keeping Claude sharp throughout the lifetime of a context window. So when you give something to Claude code to read, it pays really close attention at the beginning. The first 40% of context window is really well primed. You have the system prompt. You have the first messages. You have your Claude MD injected at the beginning as well, and it really pays attention.

00:29:50.185 --> 00:29:55.410
And this is also true near the very end where you have recency bias towards the latest messages.

00:29:55.730 --> 00:30:03.330
But the context in the middle or the monkey in the middle starts to get a little bit fuzzy. So information buried in the middle starts to get compartmentalized

00:30:03.330 --> 00:30:05.730
in a way where it can't maintain that full fluency

00:30:06.035 --> 00:30:11.475
or flow of thought. Now the problem can get worse over time because every time Claude uses a tool,

00:30:11.715 --> 00:30:14.115
the result is added to this middle section.

00:30:14.355 --> 00:30:31.840
So a customer comes back with 40 fields when you only need five. Each one pushes the important stuff further into the fuzzy zone. So naturally, how do we fix this? Well, Anthropic comes up with three different ways to accomplish this. First, you can pull out the key facts and put them at the very top of a conversation,

00:30:32.160 --> 00:31:13.345
essentially pinning them in a place where Claude will always see them. So you can think of it as a key fact summary block. Another method you can employ is trimming verbose tool outputs. And what the word verbose means here is you get a series of data from a tool. A lot of this data could just be pure metadata that doesn't actually move anything forward, and you can get rid of it. And by trimming it progressively, you just keep the tool outputs that matter which will flood the context window less. And the third way is to delegate tasks to sub so they can maintain all of their messy output in their own individual context and box. It's all isolated and boxed off, and you just get a clean summary back. The guide actually mentions explicitly

00:31:13.345 --> 00:31:22.100
that it's infinitely better to start a brand new session with a summarized version of outputs from before versus pushing through a conversation

00:31:22.180 --> 00:31:27.140
even if you're at that million context window because you have all of this different set of information,

00:31:27.380 --> 00:31:39.995
tool calls, different trials, pivots in the conversation that pollute your context window. If you're ever curious at what is in your memory at a single point in time, you can always go into Cloud Code, do slash

00:31:40.075 --> 00:32:18.780
memory, then in here it will tell you that auto memory is on, what your project memory looks like, if you wanna check-in at your Cloud MD, the fact that you have some certain rules here, some level of user memory, and then you can also open your auto memory folder to see exactly what's in there. So you can click on enter right here. This will open it up in another window, then you'll be able to see a series of markdown files that denote exactly what it's remembering about your current session. And to close the loop on reliable outputs, there's a section related to human in the loop, which is basically when do you escalate a particular scenario to an agent. So if you have some form of chat agent in the wild and someone asks to speak to a human,

00:32:19.020 --> 00:32:46.040
then the goal should not be to try to fix the issue first using a language model and not to try one more thing. It should be to respect this request and execute it right away. And it's important to zero in on this because it explicitly says that this will probably trip up people because you'll try to get creative with how AI can answer something, but if someone asks for handoff, you give them handoff. Then the second scenario, the rules could be unclear and the agent could be unsure about what policy applies.

00:32:46.840 --> 00:33:18.280
The prescribed action here is to escalate, but escalate using what's called a full package. This full package includes what the customer information is, the ID, the root cause, what was attempted and tried, and what is the recommended action. So very similar to managing an actual customer system like Zendesk or similar, you would execute this in a very similar way. So the agent would technically come to the conclusion that it can't make any meaningful progress, and this is what it could look like in terms of a final package to hand off. And for the third scenario, if it's a straightforward issue,

00:33:18.360 --> 00:33:24.120
the policy is clear, which is to allow the agent to resolve it. But it comes with a big caveat.

00:33:24.280 --> 00:33:26.040
Even if it resolves it perfectly,

00:33:26.555 --> 00:33:38.155
it should still ask, would you prefer I transfer you to human agent? So you wouldn't want to give the agent itself a confidence score and escalate it when it's low. And one of the many reasons why it thinks that sentiment analysis

00:33:38.155 --> 00:33:41.260
can miss the mark is it can misread sarcasm,

00:33:41.260 --> 00:33:43.020
cultural differences, and tone.

00:33:43.260 --> 00:33:47.660
So I actually had to double check whether or this was legit. So I noticed that this is the question

00:33:48.060 --> 00:33:53.740
in the example guide here which says your agent achieves fifty five percent first contact resolution

00:33:53.820 --> 00:34:12.690
well below the 80% target. And you can see here it says that sentiment doesn't correlate with case complexity which is the actual issue. Alright. And we're getting near the end here, and this portion of the guide just covers error propagation. Basically, what to do when things go wrong. Now compare that to a detailed error that includes what went wrong, what was attempted,

00:34:12.770 --> 00:34:19.970
any partial results that came back, and what else could be tried. So now the main agent can actually make smart decisions,

00:34:20.050 --> 00:34:26.045
meaning trying a different search using data from a previous run, switching to a completely different source,

00:34:26.365 --> 00:34:34.125
or just basically noting that gap and moving on. So the TLDR of the TLDR is this just breaks down how to allow your agents to fail gracefully.

00:34:34.570 --> 00:34:40.170
Meaning failing in a way where you get meaningful errors, you can get meaningful outputs and meaningful retries.

00:34:40.170 --> 00:34:45.370
And just to bring everything together, because we've had all kinds of thoughts and examples and paradigms,

00:34:45.450 --> 00:34:54.415
what are the five rules that you can take away that will set you on the right path? Whether you're just learning Claude code or you're preparing to dive into the Claude code architecture exam.

00:34:54.735 --> 00:35:01.535
So rule numero uno is if it has to work a 100 of the time, whether it's money related, security, legal,

00:35:01.775 --> 00:35:08.520
don't rely on telling Claude in a prompt. Use a hook that physically blocks the action. So prompts are suggestions

00:35:08.680 --> 00:35:14.760
and hooks are the laws. Rule number two is when something breaks never return a generic error.

00:35:15.000 --> 00:35:22.765
Always include what broke, what tried, what partially worked, and what else could be done. Rule three is keep your agents focused.

00:35:22.765 --> 00:35:31.245
Four to five tools max per agent and an agent with 18 tools makes infinitely worse decisions than one with five that are directly

00:35:31.470 --> 00:35:37.550
So less choice and better decisions. Number four is review your code in a separate Claude session.

00:35:37.710 --> 00:35:46.350
The one that wrote the code is naturally biased toward thinking it's correct. A fresh session with no history catches what the first one will never.

00:35:47.915 --> 00:35:55.275
So two or three real examples of what you want produce more consistent results than a full page of written instructions.

00:35:55.515 --> 00:35:57.035
Claude learns the pattern,

00:35:57.275 --> 00:36:13.100
just not the format. And the real trick here is understanding that although these are five separate rules, they're all kind of the same concept, which is if you need to rely on building proper agentic systems in the wild, then you want to focus on the right tool that has the right level of determinism,

00:36:13.180 --> 00:36:16.380
which is its ability to execute something predictably

00:36:16.460 --> 00:36:29.335
every single time. And the main thing to take away from this is that although these are five different rules, they're essentially the exact same concept just showing up in different patterns. And the TLDR of it is to be structured,

00:36:29.335 --> 00:36:40.250
to be explicit, and to not have a what if or a probably will work with something like a prompt when you need the firepower of something like a hook. So if you nail understanding these core five principles,

00:36:40.330 --> 00:36:44.250
it'll give you the 80 of the eighty twenty. And more importantly, it'll give you the foundation

00:36:44.595 --> 00:36:53.875
to keep adding on additive knowledge. Now you might think that I'm gonna end off there. You might be even hoping for it, but I'm gonna leave you with one more thing before we depart for this video.

00:36:54.115 --> 00:37:04.370
So I found this really good guide by this user on x. I can't pronounce the username, but he came up with this article here that says, I want to become a Claude code architect.

00:37:04.610 --> 00:37:16.575
And in it, he came up with a series of prompts that break down each and every section of the official architecture guide. He's created these very bespoke prompts, I would imagine, using AI,

00:37:16.655 --> 00:37:20.975
and you can just pull this up and go into Cloud Code.

00:37:21.215 --> 00:37:30.200
You can paste it, and then it will ask you and interview you on your competence on a particular domain. So if we take this behemoth prompt for section one,

00:37:30.440 --> 00:37:35.480
this just says you are an expert instructor teaching domain one, architecture and orchestration

00:37:35.560 --> 00:37:53.065
of the Claude certified architects certification exam. And then at the bottom, it says welcome. It tells you the weighting of this particular section, and it asks you how familiar you are with AgenTic systems. If you say something like none, then it will create a custom learning path for you to start going back and forth through these concepts.

00:37:53.145 --> 00:37:56.825
So you see here, it breaks down what an AgenTic loop is, and at the very bottom,

00:37:57.400 --> 00:37:58.920
there's a concrete example.

00:37:59.640 --> 00:38:02.760
The critical field, we already alluded to this, the stop reason,

00:38:03.160 --> 00:38:36.410
the anti pattern, the correct pattern. It's gonna keep going telling you which part of the guide to reference, and this is awesome. So kudos to this individual. I'll leave the link for you with some other goodies that I'm about to tell you right now. Now as I recover from filming this video, I'm gonna leave you with a mega guide going through everything I walked through today with the actual visuals themselves, a breakdown of the concept, hopefully, in a better way than I was even able to articulate. And I'll make that available to you in the second link in the description below. And for those that wanna go infinitely deeper on Claude code and be in a whole ecosystem where you have coaches, myself,

00:38:36.490 --> 00:39:01.035
a brand new upcoming course, which is bound to blow your mind in terms of what you can do with Cloud Code, then you'll wanna check out the first link in the description below and maybe join me in my early AI adopters community. And for the rest of you, if you found this to be a helpful labor of love, one thing that you could do as a thank you is just leave a like and a comment on the video. If you like my stuff and you want me to go deeper on these kind of concepts, then subscribe to the channel and let me know. I'll see you in the next
