Mark Kashef · Youtube · 39:01

Anthropic's NEW Claude Architect Guide In 39 Minutes

A 39-minute walk-through of Anthropic's new Claude Certified Architect exam guide, translated from a 40-page PDF into five domains, three demos, and five rules.

Posted

March 22nd 2026

2 months ago

Duration

39:01

Format

Tutorial

educational

Channel

MK

Mark Kashef

§ 01 · The Hook

The bait, then the rug-pull.

Anthropic just shipped a real, pass-or-fail Claude Certified Architect exam, and the 40-page exam guide doubles as the best Claude Code syllabus in print. The host reads every page, then translates the five domains into plain English with diagrams and live demos so you do not have to.

§ · Stated Promise

What the video promised.

stated at 00:22 "I went through the entire exam guide myself, not through an LLM. I'm going to synthesize this entire exam guide into this video and break down each and every concept for you. And as a bonus, I'll share some resources that should supercharge your learning journey." delivered at 38:07

§ · Chapters

Where the time goes.

00:00 – 01:39

01 · Anthropic just dropped a real certification

Cold open + pitch: a 40-page exam guide is the best Claude Code syllabus available, and the host has read it cover to cover.

01:39 – 02:03

02 · The 5 domains explained

Pie-chart breakdown of the exam's five domains and their weighting.

02:03 – 03:25

03 · Domain 1: Agentic Architecture (27%)

Why agent architecture is the single most important domain - how Claude thinks, coordinates, and enforces rules.

03:25 – 04:07

04 · The Agentic Loop

Request -> response -> tool_use -> execute -> repeat. The only field that decides when the loop ends is stop_reason.

04:07 – 05:18

05 · Anti-patterns to avoid

Three mistakes: parsing text for 'I'm done', setting hard iteration caps, and ignoring stop_reason.

05:18 – 06:38

06 · Hub-and-Spoke: coordinator + subagents

One coordinator agent decomposes the task, subagents run in isolated contexts, results merge at the end.

06:38 – 07:34

07 · The narrow decomposition mistake

Coordinators that decompose too narrowly miss whole branches of the problem - fix by giving broad goals, not narrow checklists.

07:34 – 08:49

08 · Live demo: spawning 3 subagents in parallel

Terminal demo: a single prompt spawns video/written/audio research subagents in Claude Code, each using its own tools and tokens, with a final synthesis.

08:49 – 10:00

09 · Prompts vs Hooks (most important concept)

Prompts are best-effort suggestions; hooks are deterministic scripts that physically block actions. The exam guide's most contentious section.

10:00 – 10:55

10 · When to use hooks over prompts

Prompts for style/tone/formatting; hooks for compliance, financial, and security. When money is on the line, hooks every time.

10:55 – 11:42

11 · Live demo: /hooks in Claude Code

Walk-through of /hooks in the terminal and the claude-code-guide agent as a second way to discover the right hook.

11:42 – 12:07

12 · Domain 2: Tool Design & MCP (18%)

Tool descriptions are the single highest-leverage thing you can fix - they decide which tool fires when descriptions overlap.

12:07 – 13:10

13 · Why tool descriptions matter most

Bad descriptions cause 40% misrouting; good descriptions with explicit do-not-use clauses drop misrouting to under 2%.

13:10 – 13:54

14 · The tool overload problem

Project vs user-level MCP, where to put .mcp.json, how to keep API keys in environment files.

13:54 – 15:28

15 · MCP server scoping: project vs user level

How project-level and user-level MCP differ; community vs custom servers; checking what is wired up with claude mcp list.

15:28 – 16:57

16 · tool_choice: auto, any, forced

An agent with 18 tools makes worse decisions than one with 5. tool_choice can be auto, any, or forced - force the first move, then loosen the leash.

16:57 – 18:51

17 · Domain 3: Claude Code Configuration (20%)

Most people dump everything into one CLAUDE.md. The guide splits it into three layers: user, project, and path-specific.

18:51 – 20:04

18 · The 3-layer CLAUDE.md hierarchy

Layer 1 user-level (~/.claude/CLAUDE.md), Layer 2 project-level (.claude/CLAUDE.md, version-controlled), Layer 3 path-specific (.claude/rules/*.md).

20:04 – 21:06

19 · Path-specific rules (.claude/rules/)

Rule files load only when their glob matches - testing rules for test files, API rules for the API folder. Keeps CLAUDE.md lean.

21:06 – 22:25

20 · Commands vs skills vs plan mode

Commands are reusable slash prompts; skills are scoped, isolated mini-agents with their own tool allow-list; plan mode is for ambiguous multi-file work.

22:25 – 23:40

21 · Claude Code in CI/CD pipelines

Using claude -p with --output-format json to make Claude Code run non-interactively inside a GitHub Action.

23:40 – 24:31

22 · The -p flag and --output-format json

How the -p and --output-format json flags turn Claude Code from a chat tool into something a pipeline can drive.

24:31 – 25:45

23 · Why you need a separate review session

A session biased by writing the code is bad at reviewing it. Always spin a fresh, stateless session for review.

25:45 – 26:26

24 · Domain 4: Prompt Engineering (20%)

When Claude is inconsistent, the instinct is to write more rules. The exam guide says: show 2-3 examples instead.

26:26 – 27:15

25 · Few-shot examples vs instructions

Two or three concrete input/output examples beat a paragraph of detailed instructions every single time. Claude learns the underlying pattern.

27:15 – 28:06

26 · Guaranteed JSON with tool_use

Define a tool with a JSON schema, set tool_choice to forced, and extract from the tool_use response. Eliminates syntax errors, not semantic ones.

28:06 – 29:23

27 · The validation loop

When the model misreads data, retry with specific feedback - the original document, the extracted field, the literal mismatch. Not just 'try again'.

29:23 – 29:45

28 · Domain 5: Context Management (15%)

Why Claude pays attention at the beginning and the end of context but goes fuzzy in the middle, and what to do about it.

29:45 – 30:37

29 · The lost in the middle effect

The first 40% of context is well-primed, the end has recency bias, and the middle drifts. Tool outputs push the important stuff into the fuzzy zone.

30:37 – 31:46

30 · 3 ways to fix context bloat

Pin a key-facts block at the top, trim verbose tool outputs, delegate messy work to subagents whose context never pollutes yours.

31:46 – 32:24

31 · /memory in Claude Code

/memory shows what is actually in scope right now - project memory, user memory, auto memory - so you can audit before a fresh session.

32:24 – 33:41

32 · When to escalate to a human

Three escalation scenarios: customer asks for a human (escalate immediately), policy is ambiguous (escalate with a structured handoff), or issue is straightforward (resolve, but still offer a human).

33:41 – 35:00

33 · Error propagation done right

Never return a generic 'failed'. Include what broke, what was tried, what partially worked, and what alternatives exist.

35:00 – 37:12

34 · The 5 rules that apply across every domain

Closing recap: hooks for high-stakes work, structured errors, 4-5 tools per agent, separate session for review, few-shot examples beat instructions.

37:12 – 38:07

35 · Interactive study prompts (X post resource)

A community-built prompt set turns Claude Code into an interactive instructor that quizzes you on each domain.

38:07 – 39:01

36 · Free study guide + where to go next

Mark plugs his own mega-guide and Skool community as deeper next-step resources.

§ · Storyboard

Visual structure at a glance.

cold open

hook cold open 00:00

5 domains

promise 5 domains 01:35

agentic loop

value agentic loop 03:20

hub-and-spoke

value hub-and-spoke 05:32

subagent demo

value subagent demo 07:50

prompts vs hooks

value prompts vs hooks 09:05

tool descriptions

value tool descriptions 12:00

tool overload

value tool overload 16:55

claude.md hierarchy

value claude.md hierarchy 19:03

commands vs skills

value commands vs skills 21:20

ci/cd pipelines

value ci/cd pipelines 23:02

few-shot examples

value few-shot examples 27:00

json with tool_use

value json with tool_use 27:50

lost in the middle

value lost in the middle 30:10

human-in-the-loop

value human-in-the-loop 33:10

the 5 rules

value the 5 rules 35:10

study resources

cta study resources 37:10

sign-off

cta sign-off 38:55

§ · Frameworks

Named ideas worth stealing.

01:39 list

The Five Exam Domains

Agentic Architecture & Orchestration (27%)
Tool Design & MCP Integration (18%)
Claude Code Configuration & Workflows (20%)
Prompt Engineering & Structured Output (20%)
Context Management & Reliability (15%)

Anthropic's exam weighting tells you where to spend prep time - agent architecture alone is more than a quarter of the test.

Steal for any internal team training on Claude Code

03:20 model

The Agentic Loop

Code sends request
Claude responds with stop_reason
If stop_reason = tool_use, execute the tool
Feed result back, loop again
If stop_reason = end_turn, stop

Every Claude agent - SDK, Claude Code, custom framework - runs this loop. The exit signal is the stop_reason field, not text in the response.

Steal for explaining how agentic systems actually work to a non-technical stakeholder

05:18 model

Hub-and-Spoke (Coordinator-Subagent) Pattern

Coordinator decomposes the task
Specialised sub-agents run in parallel, each in its own context
Sub-agents do NOT communicate with each other
Coordinator merges results at the end

The canonical architecture for complex multi-step work. Each sub-agent keeps its own tools and tokens; the coordinator only sees summaries.

Steal for any research, scraping, or analysis pipeline that branches

08:50 concept

Prompts vs Hooks

Prompts are best-effort suggestions (probabilistic, ~88% reliable in Anthropic's refund example). Hooks are deterministic scripts that physically block actions. Style/tone -> prompt. Money/compliance/security -> hook.

Steal for any time you would write 'always verify X before Y' in a system prompt

12:07 concept

Tool Description Anti-Pattern

Vague overlapping descriptions cause ~40% misrouting between similar tools. Adding explicit 'use INSTEAD OF X when Y' clauses drops misrouting under 2%. The description IS the interface.

Steal for any MCP server you ship or any internal tool registry

18:51 model

The 3-Layer CLAUDE.md Hierarchy

Layer 1: user-level (~/.claude/CLAUDE.md) - personal preferences, never shared
Layer 2: project-level (.claude/CLAUDE.md) - team rules, version-controlled
Layer 3: path-specific (.claude/rules/*.md) - conditional, only loaded when matching files are open

Split CLAUDE.md across three layers so Claude only loads what is relevant to the current task. Most people dump everything into one giant file and waste tokens on every session.

Steal for any team CLAUDE.md that has grown past a single screen

35:00 list

The Five Cross-Cutting Rules

If it has to work 100% of the time, use a hook not a prompt
Always return structured errors - what failed, what was tried, what alternatives exist
Cap each agent at 4-5 tools max
Always review code in a separate Claude session from the one that wrote it
Two or three concrete examples beat a paragraph of instructions

The closing distillation - five rules that apply across every domain. The unifying principle is determinism: pick the tool with the right level of predictability for the job.

Steal for a one-page Claude Code style guide for any team

§ · Quotables

Lines you could clip.

16:58

"Giving an agent 18 tools is like hiring a brand new employee and giving them access to every single system from day one."

Vivid analogy that explains tool overload in one sentence. → TikTok hook

09:56

"Prompts are suggestions and hooks are laws."

The whole prompts-vs-hooks debate compressed into eight words. → IG reel cold open

24:26

"Fresh eyes, even AI eyes, catch more."

A clean one-liner for the separate-review-session rule. → newsletter pull-quote

28:03

"Claude doesn't just copy paste your examples. It learns the underlying patterns behind them."

Reframes few-shot prompting in a way most beginners miss. → TikTok hook

29:23

"Two to three examples will beat a full page of instructions each and every single time."

A defensible, controversial claim about prompting. → IG reel cold open

32:08

"It's infinitely better to start a brand new session with a summarized version of outputs from before versus pushing through a conversation even if you're at that million context window."

Counters the 'just use a longer context window' instinct. → newsletter pull-quote

12:40

"The description of the tool is really the interface of tooling."

Reframes tool descriptions as API design. → TikTok hook

§ · Resources Mentioned

Things they pointed at.

00:00productEarly AI Adopters (Skool community) ↗

38:07productClaude Architect Study Guide (Gumroad) ↗

00:22linkAnthropic Claude Certified Architect exam guide

16:58toolWarp (terminal used in live demo)

15:40toolclaude mcp list (Claude Code command)

31:38tool/memory (Claude Code command)

38:07linkX post: I want to become a Claude code architect (community prompt set)

§ · CTA Breakdown

How they asked for the click.

38:07 product

"Check out the first link in the description and maybe join me in my early AI adopters community."

Soft, value-led. Framed as a community of learners with a course coming, paired with a free mega-guide so the ask does not read as purely commercial.

§ 04 · The Script

Word for word.

HOOK opening / re-engagementCTA the pitch metaphor analogy

00:00HOOKAnthropic just released something huge, which is a brand new certification program called the Claude certified architect. It's a real exam, pass or fail, and one of the most important things is that it's based on five core domains. Now whether you're trying to get certified or you just want to get better at using Claude code and becoming a master of it, then this entire syllabus that they've put together will act as the best resource for you to learn exactly how to go from zero to hero.

00:26HOOKSo here's the deal. I went through the entire exam guide myself, not through an LLM. I read through every single page.

00:32HOOKSo I'm going to synthesize this entire exam guide into this video and break down each and every concept for you. And as a bonus, I'll share some resources that should supercharge your learning journey.

00:43HOOKSo this is the very guide that I was discussing before. It is 40 pages and it walks through everything from the target candidate description of what it's like to be a master of Cloud Code in Anthropics. Each and every paradigm you should master and be prepared for.

00:57HOOKThere are some exam content response types. There's some preamble around what the exam entails, what the format is, and this is the most important part, which is the content outline. And this is where it breaks down the distribution of each and every part.

01:10HOOKAnd beyond that, in each area, it walks through a series of scenarios, and you'll see that it's very specific. It's not just about knowing MCP tools, it's about understanding exactly how to conceptualize each and every part of said tool.

01:24HOOKIf you scan through all of this, you'll see exactly how detailed it is and it can be overwhelming to many, which is why I wanna break down each and every concept and go through every domain with a fine tooth comb. So like I said, there are five different domains that Anthropic thinks that you need to know before you can call yourself a Claude code master.

01:42So the biggest one, which is 27% of the exam, is agent architecture. So this is basically how Claude thinks step by step, how it coordinates with other agents, and most importantly, how you enforce rules that just telling Claude in a prompt can't guarantee.

01:57If you only study one domain, this is the one. Now tool and MCP integration is at 18%. So this is how Claude connects to the outside world, your databases, your APIs,

02:08your file systems. And the number one reason why agents call the wrong tool is actually embarrassingly simple, and we'll get to that shortly. So Claude code configuration sits at 20%.

02:19This is your Claude MD files, your skills, your commands, and things like plan mode. And most people dump everything into one giant file, but the exam guide teaches you how to split that into three layers so Claude isn't loading irrelevant tools every single time you ask us something.

02:36Ultimately, we have the prompt engineering section, and this isn't just about writing better prompts.

02:42The certification is actually very specific here. If you want Claude to give you consistent output, show it two or three real examples of exactly what you want. That works better than writing a whole paragraph of detailed instructions

02:55every single time. And last but not least is context management and reliability. So here's the thing.

03:01Claude reads the beginning and the end of what you give it really well, but the stuff in the middle sometimes can get fuzzy, and that's called lost in the middle logic. So now that we've set the tone for what's important, let's go through every single one, and we'll naturally start with the largest and most important, which is the agentic architecture and orchestration.

03:20Let's start with the very engine that powers each and every Claude agent, which is the agentic loop. So whether you're using Claude code, the Anthropic SDK, or any agentic framework that is built on top of Claude, this is what's happening each and every single time you run an agentic workflow.

03:36So your code first sends a request to Claude, and then Claude naturally responds. The most important thing to keep aware of is this stop reason right here. You wanna check this all the time.

03:48If it says tool use, that means Claude wants to go use a tool, like reading a file or running a command. You can execute said tool, feeds the result back, and then it goes again into this endless loop. But in this case, it's not actually endless.

04:00It goes up until a very certain point. If it says something like end turn, then that's essentially Claude saying that I'm done. That's pretty much the entire engine over and over again.

04:10Now the exam guide points three different areas where people make mistakes in understanding the agentic loop, and these are basically the anti patterns. So first, reading Claude's text looking for phrases like I'm done or task complete.

04:24That's unreliable and it breaks all the time. Second, you don't wanna set limits like stopped after 10 loops. You don't know the level of depth that Claude code needs to do yet to accomplish a specific task.

04:36So you might be cutting off work that genuinely needs 11 steps. And third, you don't wanna look at what Claude said to figure out if it's finished. There's a very specific field, like I said, stop underscore reason that exists for exactly this purpose.

04:51It's the only thing that you should be checking. Now if you're using Cloud Code in a terminal, then sometimes you won't see this stop underscore reason.

04:58Every time Cloud Code reads a file, executes a tool, runs a command, or spins a sub agent, this is exactly the process and the patterns that drive that forward. Understanding this will really help conceptualize everything else we're about to cover. So when Claude needs to do complex, like research a topic from multiple angles or process a really large project, You don't need to send one agent to do everything.

05:21You basically have one agent that sits in the center, and this is the main agent. It breaks the task down, hands off pieces to specialized agents, and then combines the result at the very end. So these would be examples

05:33of the other sub agents. So one synthesis agent that uses tools to verify and write, another search agent that uses tools like search and fetch URL, and then you have the analysis agent that uses tools like read doc and extraction. Now the exam guide mentions this specific concept.

05:49So each one of these agents has its own separate window, its own separate set of tasks, and technically its own world. So there's no communication in between different sub agents.

06:00That's actually what the newer agent teams feature was designed to do, which is enable that communication through providing the equivalent of an email inbox to each agent so they can email each other, see who's blocking who, and execute the task in unison. So it's important to understand that sub agents don't maintain track of memories of other sub agents at the same time.

06:20So sub agent a will have no idea what sub agent b did. It will all kind of come together at the very end once the main agent takes the TLDR of each's outputs and or findings and then disseminates that back to you. Now there's one major mistake that many people make when it comes to understanding sub agents, and it's the following.

06:40So let's zoom in here. Even though you have a main agent, it's completely possible that the coordinator could break down tasks too narrowly, and this is something you have to look out for.

06:50So in practice, it could look like this. You say research AI in creative industries, and the coordinator only creates subtasks about visual art. So digital art, graphic design, or photography,

07:03but it completely misses music, writing, film, and game design. So the sub agents did their job perfectly,

07:10but the coordinator just scoped it wrong. It's basically like having a bad manager for a great team. So the fix is to give broad goals,

07:19not narrow checklists to this main coordinator agent. You wanna let the sub agents figure out how to break down all the sub tasks based on their narrow goals that are defined by this broader goal. So instead of me just breaking this down conceptually, let's go down into the terminal and see an example of this.

07:36So if we go into warp here, I'm going to copy paste this prompt and we'll send it over. And while we send it over, I'll read it through. So it says, I want you to research the impact of AI on content creation by spawning three sub agents in parallel.

07:50The first sub agent is to research how AI is changing video content creation. Creation. The second one is to research how AI is changing a written content creation.

07:59And the last one is how it's changing audio. And then we say each sub agent should search the web and return a three bullet summary. So it's very broad in terms of what they're looking for.

08:08We're not micromanaging exactly how they're going to do said thing, but we're giving them the overall assignment. So you've taken the equivalent of an employee,

08:17you've onboarded them, and then after onboarding them, you trust them enough to give them a very well situated task and allow them to execute independently. And after completes, you could see right here all three agents are finished. Each one use its own set of tools, its own set of tokens,

08:32and then we have the results from each one, and then we have the overall synthesis. So this is the main agent compiling all the results of the sub agents, and this is how this coordinator sub agent pattern works.

08:43HOOKSo this might be the most important part of the entire exam guide where they differentiate between prompts and hooks and when and where to use each. So prompts are what I call best effort. You can tell Claude something like always verify customer before processing a refund, and most of the time it works, but sometimes it doesn't.

09:02HOOKIf we even hop into their exact scenario, they have this question where they say production data shows that in 12% of use cases, your agent skips this invocation of a function get customer entirely and just go straight to look up an order based off of the stated name occasionally leading to misidentified accounts. So from a business perspective,

09:22HOOKthis is not okay. 12% of accidentally

09:26HOOKprovided refunds to the wrong person or to people trying to take advantage of this becomes a big issue. Now hooks are completely different. A hook is basically a small script that runs automatically before or after Claude tries to do something, and it can literally block Claude from taking an action unless a specific condition has been met.

09:44HOOKSo it's not 99%. It's not 99.9%. It has to be a 100%.

09:50HOOKAnd the action physically can't happen if the hook says no. So you can think of prompts as suggestions and hooks as laws.

09:59So the exam guide draws a very clear line on when and where to use prompts versus hooks. And when it comes to what it's good for, it's primarily around style, tone, and formatting. These are things that you can execute well 90% of the time, and it won't land you necessarily in an area of harm or a land of hurt.

10:17Hooks are optimized for compliance, financial stuff, and security. So anywhere where one single point of failure can cause some real issues.

10:24And this overall concept is where a lot of people go wrong because they just think that if something failed 90% of the time, they can just tweak the prompt to perfection. And as running a company called prompt advisers where initially, all I would do for companies is help them prompt engineer every system prompt in a system, in a production use case for content creation,

10:45there is a level where a prompt is just not good enough over a thousand iterations or 5,000 iterations. If you're not as familiar with hooks and you want a little bit of a debrief, then you have two options. If you pop into a terminal, you could always do slash

10:59hooks, and then this will show you each and every way that you can invoke a different tool. And this list goes on and on and on. And if you click on one and you click enter, it'll show you exactly what it would do.

11:11And option two is, I showed this in a prior video, you can use my favorite function in Claude code, one of the most underrated, which is the Claude code guide agent, and then you can ask it what are the best hooks for x use case.

11:26And then it will go through with full knowledge of what hooks it has at its capacity and which one is optimized for your use case and whether or not you should be using a hook or a prompt to begin with. So whereas the last concept, the enforcement piece is probably the most important, this is the highest leverage, which is getting tool descriptions correctly, which is basically giving Claude code, which you provide it with whatever tools you want at its arsenal as well as its native tools, the right tool at the right time with the right description for the right use case.

11:55So tools are basically how Claude decides which tool to use when it has multiple options, And that's typically not a small feat because you can have two tools that have vague overlapping descriptions, like one that retrieves customer information and another that retrieves exactly what the order entails.

12:13So that could lead to some form of communication issues. So Claude has to essentially guess, and the exam guide covers this very clearly.

12:20Ambiguous descriptions cause frequent misrouting. So Claude ends up calling the wrong tool way more often than you'd expect. And one really important thing to note is that when you invoke these tools, sometimes

12:33you see the final result being executed properly. But back when everyone was using no code tools like n eight n, you would have the agent in that platform work and execute the workflow, and you would see it fire the right result.

12:46So you could get the exact result you're looking for in Cloud Code, but you have no idea that it actually did the wrong thing three or four times to eventually do the right thing. So it's not just about the outcome, but also the efficiency in getting to that outcome because as it tries through all the ways it doesn't work, it spends your tokens and you wanna be as token efficient as possible.

13:06So to give you something more tangible, let's say this is one of your functions, get customer, you would basically say that you wanna use this tool whenever you need customer ID and profile data, and you want to use the lookup order instead when you have an order number and need a shipping status.

13:21So you essentially want to be intentional in saying do not use this tool when this happens versus just saying when it should use that tool. And this is pretty much the highest leverage tip from the entire guide, which is the description of the tool is really the interface of tooling and fixing the descriptions to make sure it knows the optimal path, the critical path the most critical thing that you can do in your workflow.

13:43Now while MCP servers are increasingly falling out of favor for different use cases, there are times where they make sense. So the exam guide does cover where to use different levels of scope for your MSP servers.

13:55You have project level and you have user level, and I'll walk you through the difference of when and where to use both. And this essentially allows Claude to connect to external tools like GitHub, Slack, Outlook, whatever it is.

14:08It's one of the vectors that you can use. And if you've ever used Claude Chat or Claude Cowork and use their connectors feature, it's essentially using an MCP under the hood. So project level MCP

14:18lives in a file called the dot m c p dot JSON at the root of a project. So any passwords or API keys go in what are called environment variables,

14:29where they're denoted as dot n for dot environment, and they never directly end up in the core file. So for example, if you had an MCP server for GitHub, which is essentially code version history, you would have an environment variable. It would be set to the token of your GitHub, and this would be written to an environment file.

14:46So every single time that an agent would try to use an MCP server, it would then be auto authenticated through this file, then go and invoke this specific service. User level MCP lives in a file in your home directory. So this is basically your personal sandbox.

15:00You have experimental tools, personal API keys, things you're testing before rolling them out to the rest of the team. Now the practical takeaway from the exam guide is that essentially you can use as many community based MCPs.

15:12These are not necessarily open source MCP servers, sometimes aren't too safe, but more so the native MCP servers from the platforms themselves. So if you look at the major providers like Salesforce, GitHub, etcetera, everyone has some form of instruction for using MCP servers,

15:28and only build custom servers when you absolutely need to. And it's important to remember that an MCP server is purely a function. So if you just need your functions executed in a slightly different way or different order, you might not need a custom MCP.

15:42Now real quick on the terminal side of things, all you'd have to do is go into your terminal, and you could do one of these two things. You could say, Claude MCP list.

15:51This would go and invoke if you have any MCP servers whatsoever. Now personally, I've migrated my entire ecosystem to skills, CLIs, etcetera. So you won't find any that are already authenticated.

16:02You'll find just the shell of the ones that I used to use. So the Gmail, Google Calendar, Canva, and Zapier, all of them I used to use, but now I've migrated all of them to use the skills primarily just for token efficiency, security, etcetera. But if you wanted to see which ones you had out of the box, that's the way you do it.

16:19If you're using MCP servers at the the project level, then you could just paste the command just like this, where you could say, show me the dot MCP dot JSON file in this project and explain the MCP server configuration. And then you get this response where in this case, I don't have in this particular project an MCP dot JSON file, and it walks through what needs authentication like we saw before,

16:40how to configure it, and there's that command that I showed you before, Claude m c p list. It basically invoked that. So whether you're asking for it through natural language or going straight to the source with this command, then you can have full visibility on what's happening with your MSP servers.

16:54The next principle in the exam guide is the tool overload problem, and this is essentially making better decisions by having less options. So you can think of it like this. Giving an agent 18 tools is like hiring a brand new employee and giving them access to every single system from day one.

17:11They're gonna use things that they shouldn't call tools outside their lane. You wanna keep each agent down to a maximum of four to five tools that are directly relevant to what they're doing. That constraint is really what makes them precise.

17:23And if you need a reminder, earlier, I showed you an example of spinning up three sub agents, and you'll notice that all of them used four or five tools at max. So this is really a paradigm that's built into Cloud Code, and that allows it to have a process, create SOPs. So this one would be search, fetch, extract, and save.

17:41So the goal is being precise, reliable, and always on task. Now there's also a setting called tool choice that controls how Claude picks his tools.

17:51There are three main modes. One of them is auto where basically Claude decides on its own whether to use a tool or not. And then you have another one called any, and this is essentially forcing Claude to use a tool, but it has to pick which one.

18:05And finally, have forced, meaning we are making it. Use this tool and there are no options.

18:11It's not just independence, it is forced dependence on a particular outcome. So the guide alludes to the fact that you can force a tool call to make sure that step one is always consistent and predictable, and then you can loosen that proverbial leash of Cloud Code to run freely and make more autonomous decisions as long as you know you've steered it in the right direction.

18:31So you're essentially putting guardrails on its first move or couple moves and then allowing it to run freely and really tap into that power of the agentic harness. Now next up is one of the most contentious topics in Claude code, which are Claude MDs, which are the heart and soul of your operating system, your air traffic control of your repo or project if you will.

18:51And pretty much it covers all the different layers, three different layers, the user level, the project level, and the path specific rules. Now most people dump everything they know into Cloud MD, they think that it's a proxy for a knowledge base or rag, but essentially it's not. People dump their preferences, their rules, their style, their tone all in one place, and then complain

19:11why there's so many tokens being wasted all the time. The big issue is that every single time you open a brand new session, Claude auto injects that straight into memory. So you're wasting time and you're wasting tokens.

19:23So the guide splits it into three different layers. One is the user level, the next is the project level, and the last one are path specific rules. So you can treat your top layer as your personal preferences file.

19:34This lives in your core home directory. So you have your editor settings, how you like your explanations formatted. So this one's just for you and not meant to be shared with anyone or through something like GitHub.

19:45So the middle layer is a project level CloudMD, and this is where you have things like team rules, coding conventions, architecture decisions, and this essentially allows you to share it with your team assuming you have one so that everyone's on the exact same page.

19:59So this is where having some version control makes a lot of sense. And finally, we have the bottom layer here, and this is really the golden nugget of the three levels. These are path specific rules.

20:09So you create a small rule file that lives in the dot claud rules folder, and at the top of each file, you put a pattern that says when to load it. So when something like only load this when I'm editing files is a very good example. So your testing rules only show up when you're writing tests, and your API rules only show up when you're in the API folder.

20:30And lastly, if you have something like React components, if you're a developer, then you know what that is. If not, then don't worry about it. The TLDR

20:37is this is huge because Cloud Code can get focused. So you can have a lean and mean Claude MD and rely on rules to cure the path forward for any nuances that need to be taken account for a specific use case. So I know I'm throwing a lot at you right now, but the next section tries to bring everything together into cohesion.

20:55So it's really about when to use what because we haven't even started speaking about things like skills, like commands, plan mode versus direct execution,

21:04when to use each. So commands are basically reusable prompts. You save them once and you can trigger them with a slash command.

21:10So you can have slash review PR slash generate tests slash morning if you wanna execute a walk through of what your day looks like based on your calendar, your Gmail, anything you've hooked up maybe using the Google CLI. But one thing to note is that team wide commands go in a commands folder in your project so everyone can use them via something like git,

21:31whereas personal ones will end in your root folder, and these are your personal flash commands. So these are specific to you and tailored to exactly what you wanna do day to day. Now we've gone through skills at length in this channel, but just in case you and I are meeting each other for the first time, we'll go through that as well.

21:45So skill is a step above a command. A skill has its own file that defines what it can do, what tools it's allowed to use, and it runs in its own separate context. So you can think of it like this.

21:56A skill can do messy exploratory work like research files, do pretty much anything you want, and none of that clutter ends up in your main conversation. It's like sending someone to go do research in another room, and you're just bringing the summary back to main conversation. Now moving on to another existential question that many people ask and the guide goes through, to use or not to use plan mode.

22:19So if the task touches multiple files, it's ambiguous, or it could go in a few different directions, then using plan mode is the way to go. Claude explores, reads, and proposes changes without actually modifying anything. Just review it, approve it, or tell it to go in a different direction.

22:35But if it's a very obvious and straightforward single file fix, then you can just let Claude execute it directly. So you don't have to over plan in this case the same way many people will over engineer things.

22:46Now this next part is fairly advanced. So if you're nontechnical, this part might leave you squinting a little bit, but I'll try to explain it as best I can.

22:53So this is about using Claude code in what's called a CICD pipeline. What this stands for is continuous integration

23:01and continuous development. So if were to break down this concept into one sentence to make it as accessible as possible to everybody, it would be the CICD pipeline is an automated conveyor belt where a developer will push code, that code will be reviewed, and then it will be shipped and pushed to the end user, all without any form of buttons being pressed along the way.

23:21So the guide really focuses on this step three right here, but we'll get to that in a second. Step one is, like I said, you have a developer that pushes some code. Then this triggers the CI, the continuous integration

23:33pipeline to go and check it. Step three is where the magic happens, and this is what's called Claude dash p.

23:40Claude dash p is not a very straightforward concept. So, again, I'll try to break it down. The dash p essentially allows Claude code to run without asking you for anything.

23:50So no prompts, no confirmation. It's essentially bypassed permissions mode in a way, and it just runs the task you give and it gives you the result back.

23:59And then you have a flag that gives you a clean structured output that other tools can read. It's actually called the dash dash output format JSON flag. When you put these together, these two flags turn Claude code from something you can chat with solely

24:13into something that you can use to automate different parts your process. So the main learning here is that you can trigger this from any CICD pipeline, any system that essentially tests and deploys your code.

24:24Now it's hard to make that part less gibberishy, but this part will be the main takeaway from that section. And this is their important note on using separate clawed code sessions for reviewing code versus writing code because there is some level of pollution. When you write code, you essentially are biasing the language model to say, yeah.

24:43Yeah. I wrote amazing code. Because why would the language model write poor code on purpose?

24:48So you need a stateless session to go and review any form of code, anything that was produced in session one assuming you're doing something more on the technical end of things. So if you need a little anchor to remember, then you can remember that fresh eyes, even AI eyes can catch more. Two heads are better than one.

25:03In Claude code's case, five, ten, 15 heads sometimes are better than one at reviewing code as long as it's in a separate session. So for example, if I said claw dash p list all Python files in this project and summarize what each one does, output this format, then you will see it goes through every single Python function in my folder, which I won't get into in-depth, and then it comes back with the full key patterns here.

25:28All scripts use Gemini three pro preview for images. This is my thumbnails generation folder and dated output folders report lab for PDFs and one script per video topic design. So when it comes to making outputs reliable, this is a whole portion of the guide that's dedicated to dealing with inconsistency

25:46in Claude's responses. So your instinct when Claude gives you inconsistent outputs is to write more instructions. So for whatever reason, your instructions involve number crunching,

25:56something like handling different currencies, different decimal places, you try to shove all of that in there.

26:02But Claude interprets it differently each time. One response can give you one number versus another depending on the day, the model you choose, etcetera. So you can have the same set of instructions,

26:14but three different results because people keep forgetting that this isn't magic, these are language models. Now to fix this, Anthropic recommends going to few shot examples.

26:24If you're not familiar with what few shot are, these are from the beginning of prompt engineering time, one of the best ways to get consistent outputs. So you give an example.

26:33In this case, the input could be Acme Corp reported 4,200,000 in revenue for 2024, and this is the output you want exactly.

26:41So you give it exactly the parameters. In this case, we're putting it in JSON. This could be in whatever format you want, and same thing for example two.

26:48So multi shot gives it enough of a hint to generalize and better understand which direction you're going for. And the interesting thing here is that Cloth doesn't just copy paste your examples.

26:58It learns the underlying patterns behind them. That's why two to three examples will beat a full page of instructions each and every single time. Now in the same vein of consistent and reliable outputs, this also generalizes to JSON, which stands for JavaScript Object Notation.

27:14Very common when you're dealing with agents, with toolings, and tool calls. So this is also more of an intermediate to advanced use case, but it's important to know because it's covered in the guide. So I'll move from left to right and, again, try to make this as accessible as possible.

27:27So step one is you define a tool, which basically acts as your template, providing the exact structure you need. So every field, every data type,

27:36and whether or not it's required. Leaving something as optional is beneficial for Claude because otherwise if you don't tell that it's optional, then Claude will make it up. So making it optional allows Claude to say,

27:49I don't know in a very legal way. So you're a legalized way of allowing it to say, I don't have this. I don't know what to do with this.

27:56In step two, building on what we referred to before, you can force Claude to use a specific tool. So there's no option to respond with plain text, no option to use a different tool. It has to fill your template as is.

28:07So just as a takeaway, this eliminates syntax errors. So anything like malformed JSON, this is just broken JSON or markdown wrapping, but it does not eliminate semantic errors. So anything with a wrong value in set field.

28:20So step one is you extract the data, and assuming it's correct then obviously this is all done. But if it's not correct, this is the part where you really need to dial in. You're not meant to just say try again.

28:31You're meant to actually send very specific feedback. So instead of saying retry, you would say the original document, the field extraction, and the specific error.

28:40And this is how you would frame the specific error. Revenue field says $0, but document clearly says 4,200,000.

28:48So now you're giving it multiple areas to zero in on and see what might be happening. But just like everything, there's nuance. You don't wanna just keep going in this endless loop.

28:57If the answer isn't there, if the information isn't in the source document, then retrying even with the best of instructions won't help. So it's not just knowing how to validate and test, but also knowing when to stop.

29:09Now as a segue to the next section, back when I used to drive a Honda, once in a while you'd get this notification which politely asks you to take a break and typically it's because they want you to have attention to the wheel especially for longer drives. Drives.

29:22You can take this exact same paradigm in mind for the next principle. So instead of worrying about driving sharply, this focuses on keeping Claude sharp throughout the lifetime of a context window.

29:34So when you give something to Claude code to read, it pays really close attention at the beginning. The first 40% of context window is really well primed. You have the system prompt.

29:44You have the first messages. You have your Claude MD injected at the beginning as well, and it really pays attention. And this is also true near the very end where you have recency bias towards the latest messages.

29:55But the context in the middle or the monkey in the middle starts to get a little bit fuzzy. So information buried in the middle starts to get compartmentalized in a way where it can't maintain that full fluency

30:06or flow of thought. Now the problem can get worse over time because every time Claude uses a tool, the result is added to this middle section.

30:14So a customer comes back with 40 fields when you only need five. Each one pushes the important stuff further into the fuzzy zone. So naturally, how do we fix this?

30:24Well, Anthropic comes up with three different ways to accomplish this. First, you can pull out the key facts and put them at the very top of a conversation, essentially pinning them in a place where Claude will always see them.

30:35So you can think of it as a key fact summary block. Another method you can employ is trimming verbose tool outputs. And what the word verbose means here is you get a series of data from a tool.

30:47A lot of this data could just be pure metadata that doesn't actually move anything forward, and you can get rid of it. And by trimming it progressively, you just keep the tool outputs that matter which will flood the context window less. And the third way is to delegate tasks to sub so they can maintain all of their messy output in their own individual context and box.

31:06It's all isolated and boxed off, and you just get a clean summary back. The guide actually mentions explicitly that it's infinitely better to start a brand new session with a summarized version of outputs from before versus pushing through a conversation

31:22even if you're at that million context window because you have all of this different set of information, tool calls, different trials, pivots in the conversation that pollute your context window. If you're ever curious at what is in your memory at a single point in time, you can always go into Cloud Code, do slash

31:40memory, then in here it will tell you that auto memory is on, what your project memory looks like, if you wanna check-in at your Cloud MD, the fact that you have some certain rules here, some level of user memory, and then you can also open your auto memory folder to see exactly what's in there. So you can click on enter right here.

31:57This will open it up in another window, then you'll be able to see a series of markdown files that denote exactly what it's remembering about your current session. And to close the loop on reliable outputs, there's a section related to human in the loop, which is basically when do you escalate a particular scenario to an agent.

32:14So if you have some form of chat agent in the wild and someone asks to speak to a human, then the goal should not be to try to fix the issue first using a language model and not to try one more thing. It should be to respect this request and execute it right away.

32:29And it's important to zero in on this because it explicitly says that this will probably trip up people because you'll try to get creative with how AI can answer something, but if someone asks for handoff, you give them handoff. Then the second scenario, the rules could be unclear and the agent could be unsure about what policy applies.

32:46The prescribed action here is to escalate, but escalate using what's called a full package. This full package includes what the customer information is, the ID, the root cause, what was attempted and tried, and what is the recommended action. So very similar to managing an actual customer system like Zendesk or similar, you would execute this in a very similar way.

33:06So the agent would technically come to the conclusion that it can't make any meaningful progress, and this is what it could look like in terms of a final package to hand off. And for the third scenario, if it's a straightforward issue, the policy is clear, which is to allow the agent to resolve it.

33:21But it comes with a big caveat. Even if it resolves it perfectly, it should still ask, would you prefer I transfer you to human agent?

33:30So you wouldn't want to give the agent itself a confidence score and escalate it when it's low. And one of the many reasons why it thinks that sentiment analysis can miss the mark is it can misread sarcasm,

33:41cultural differences, and tone. So I actually had to double check whether or this was legit. So I noticed that this is the question

33:48in the example guide here which says your agent achieves fifty five percent first contact resolution well below the 80% target. And you can see here it says that sentiment doesn't correlate with case complexity which is the actual issue.

34:00Alright. And we're getting near the end here, and this portion of the guide just covers error propagation. Basically, what to do when things go wrong.

34:07Now compare that to a detailed error that includes what went wrong, what was attempted, any partial results that came back, and what else could be tried. So now the main agent can actually make smart decisions,

34:20meaning trying a different search using data from a previous run, switching to a completely different source, or just basically noting that gap and moving on. So the TLDR of the TLDR is this just breaks down how to allow your agents to fail gracefully.

34:34Meaning failing in a way where you get meaningful errors, you can get meaningful outputs and meaningful retries. And just to bring everything together, because we've had all kinds of thoughts and examples and paradigms, what are the five rules that you can take away that will set you on the right path?

34:48HOOKWhether you're just learning Claude code or you're preparing to dive into the Claude code architecture exam. So rule numero uno is if it has to work a 100 of the time, whether it's money related, security, legal, don't rely on telling Claude in a prompt.

35:04HOOKUse a hook that physically blocks the action. So prompts are suggestions and hooks are the laws.

35:10HOOKRule number two is when something breaks never return a generic error. Always include what broke, what tried, what partially worked, and what else could be done. Rule three is keep your agents focused.

35:22HOOKFour to five tools max per agent and an agent with 18 tools makes infinitely worse decisions than one with five that are directly So less choice and better decisions. Number four is review your code in a separate Claude session.

35:37HOOKThe one that wrote the code is naturally biased toward thinking it's correct. A fresh session with no history catches what the first one will never. So two or three real examples of what you want produce more consistent results than a full page of written instructions.

35:55HOOKClaude learns the pattern, just not the format. And the real trick here is understanding that although these are five separate rules, they're all kind of the same concept, which is if you need to rely on building proper agentic systems in the wild, then you want to focus on the right tool that has the right level of determinism,

36:13HOOKwhich is its ability to execute something predictably every single time. And the main thing to take away from this is that although these are five different rules, they're essentially the exact same concept just showing up in different patterns.

36:26HOOKAnd the TLDR of it is to be structured, to be explicit, and to not have a what if or a probably will work with something like a prompt when you need the firepower of something like a hook. So if you nail understanding these core five principles,

36:40HOOKit'll give you the 80 of the eighty twenty. And more importantly, it'll give you the foundation to keep adding on additive knowledge.

36:46HOOKNow you might think that I'm gonna end off there. You might be even hoping for it, but I'm gonna leave you with one more thing before we depart for this video. So I found this really good guide by this user on x.

36:58HOOKCTAI can't pronounce the username, but he came up with this article here that says, I want to become a Claude code architect. And in it, he came up with a series of prompts that break down each and every section of the official architecture guide. He's created these very bespoke prompts, I would imagine, using AI,

37:16CTAand you can just pull this up and go into Cloud Code. You can paste it, and then it will ask you and interview you on your competence on a particular domain. So if we take this behemoth prompt for section one,

37:30CTAthis just says you are an expert instructor teaching domain one, architecture and orchestration of the Claude certified architects certification exam. And then at the bottom, it says welcome.

37:40CTAIt tells you the weighting of this particular section, and it asks you how familiar you are with AgenTic systems. If you say something like none, then it will create a custom learning path for you to start going back and forth through these concepts. So you see here, it breaks down what an AgenTic loop is, and at the very bottom,

37:57CTAthere's a concrete example. The critical field, we already alluded to this, the stop reason, the anti pattern, the correct pattern.

38:05CTAIt's gonna keep going telling you which part of the guide to reference, and this is awesome. So kudos to this individual. I'll leave the link for you with some other goodies that I'm about to tell you right now.

38:14CTANow as I recover from filming this video, I'm gonna leave you with a mega guide going through everything I walked through today with the actual visuals themselves, a breakdown of the concept, hopefully, in a better way than I was even able to articulate. And I'll make that available to you in the second link in the description below.

38:30CTAAnd for those that wanna go infinitely deeper on Claude code and be in a whole ecosystem where you have coaches, myself, a brand new upcoming course, which is bound to blow your mind in terms of what you can do with Cloud Code, then you'll wanna check out the first link in the description below and maybe join me in my early AI adopters community.

38:47CTAAnd for the rest of you, if you found this to be a helpful labor of love, one thing that you could do as a thank you is just leave a like and a comment on the video. If you like my stuff and you want me to go deeper on these kind of concepts, then subscribe to the channel and let me know. I'll see you in the next

— full transcript

§ 05 · For Joe

Five rules that decide how reliable your Claude agents will be.

WHAT TO LEARN

Reliable Claude systems come from picking the right level of determinism for each problem - hooks where it must work, prompts where it just needs to read well, and structured handoffs everywhere in between.

Use a hook, not a prompt, anywhere money, compliance, or security is on the line - prompts run at roughly 88% reliability, hooks at 100% because they physically block the action.
Cap each agent at four to five tools max; more options actively make routing worse, not better.
Fix tool descriptions first when an agent calls the wrong thing - bad descriptions misroute 40% of the time, good ones with explicit do-not-use clauses drop that under 2%.
Split CLAUDE.md across three layers - user, project, and path-specific - so each session only loads the rules that match the file you are actually editing.
Always review code in a separate, stateless Claude session; the session that wrote it is biased toward thinking it is correct.
Show two or three concrete input/output examples instead of writing a paragraph of instructions - the model learns the underlying pattern and stays consistent.
When you escalate to a human, send a structured handoff - customer ID, root cause, what was tried, recommended action - not just a 'sorry, transferring' note.
Never return a generic 'failed' error from a tool; include what failed, what was tried, what partially worked, and what alternatives exist so the next agent can act.
Pin a key-facts block at the top of long context and trim verbose tool outputs - Claude pays close attention to the start and end of context and goes fuzzy in the middle.
Force the first tool call with tool_choice when you need a predictable first move, then loosen the leash so the agent can run freely once it is pointed in the right direction.

§ 06 · Frame Gallery

Visual moments.

06:24

11:18

20:14

25:51

29:53

32:17