WEBVTT

00:00:00.000 --> 00:01:11.845
I've been an engineer for nearly a decade. And in all of that time, right now, process has never been more important. At your fingertips now, you have access to a fleet of middling to good engineers that you can deploy at any time. But the weird thing about these engineers is they have no memory. They do not remember things they've done before, and so you need extremely strict and well defined processes to get those agents to actually do things that are useful. So this means that you as a developer are looking constantly for ways to steer your agents, to keep them on the right track. And for me, that has resulted in a lot of skill building. Here's the repo of all the skills that I'm using right now, each of which I have gone through and designed. Some of these I use relatively rarely, but some of them I use every single day. And these skills help me encode my process so that AI has a really strict path it can walk down every single time. And as a result of using all of these skills, the code quality that the AI is producing has shot up. Now if you think that process is important and that real engineering skills are important, then boy, do I have a course for you. This course is called Claude Code for Real Engineers. It's a two week cohort that starts on March 30. And for seven more days, it is 40%

00:01:11.845 --> 00:01:23.680
off. If you feel like you're behind the curve on Claude code and you want to get way ahead of the curve in just two weeks, then blimey, this is the place for you. But let's start talking about our skills with number one, which is maybe my favorite.

00:01:23.840 --> 00:01:48.365
This is the grill me skill. This skill, yes, it is just three sentences long, and let's just read it out in full to describe what it does. Interview me relentlessly about every aspect of this plan until we reach a shared understanding. Walk down each branch of the design tree resolving dependencies between decisions one by one. And finally, if a question can be answered by exploring the code base, explore the code base instead. The concept of a design tree comes from this book by Frederick p Brooks,

00:01:48.525 --> 00:02:24.170
which is the design of design. Actually, Actually, don't know if it comes from this book, but this book is where I saw it first. The design tree is this idea that as you're coming towards a design, you need to walk down all of the branches of a design tree. For instance, you might be designing a search page and you need to decide whether you want an advanced search or a text box. If you choose advanced search, then you need to figure out all of the filters and all of the sorting methods that you need on advanced search. And you keep on walking down the tree until you figure out your design kind of in full or as full as you can before committing to code. This grill me skill, when I invoke it, I invoke it when I want to reach a shared understanding

00:02:24.490 --> 00:03:17.330
with the LLM. I found that relatively recently, Claude code will tend to just spit out a plan really early when I go in plan mode, and it tends to just create a document before I feel I've reached a shared understanding with the LLM. But the grill me skill forces that conversation. It forces the LLM to interview me about every single part. Here's a conversation I had with Claude recently about adding a feature to my course video editor code base. I gave it some research that I'd done in a markdown file and I said, grill me. I'd like to think about adding this to the right page. It loaded up the skill and the thing I want to show you is just how many questions it asked me. So the first thing it did is it just explored the relevant stuff in the code base, which is good. Then we zoom down, we can see it ask question one, where does the document live? Question two, what's the UI layout? Question three, which modes get the document panel? Question four, the document life cycle? Question five, what does the right document tool look like? Question six, the edit tool shape. Question seven.

00:03:17.490 --> 00:03:21.410
Question all the way down to question nine, question 10, question 11,

00:03:21.570 --> 00:03:36.775
question 12, all the way down to question freaking 16 here. And this is a relatively short grilling session in my book. I've had sessions where I've sat there for nearly half an hour forty five minutes with the AI answering questions on really complex features. You know, that could be thirty,

00:03:36.775 --> 00:03:37.415
forty,

00:03:37.575 --> 00:03:38.695
50 questions

00:03:38.695 --> 00:03:40.695
all from this absolutely

00:03:40.695 --> 00:03:48.310
tiny skill. That's one thing I want you to take from this. Skills don't have to be long to be impactful. You've just gotta choose the right words for the LLM at the right time,

00:03:48.630 --> 00:04:03.075
and this design tree resolving dependencies has just been absolutely great for me. By the way, if you want these skills, then they will be at a link below. Once I have reached a shared understanding with the LLM, once I have grilled my idea and sort of understood all of its ramifications,

00:04:03.235 --> 00:04:35.515
if I then decide I want to implement it, then I invoke my next skill, is a write a PRD skill. I actually did this in the conversation we were just looking at. So it said anything I've missed or got wrong, and I said write a PRD. I was suffixing it with user because I have some that sort of live in the project. So that's the reason why I did that. Here's what the skill looks like. This will be invoked when the user wants to create a PRD. You may skip steps if you don't consider them necessary. So for instance, in the previous conversation, it said, we've already done a deep interview. Let's move to step four. So step one is to ask the user for a long detailed description. Then number two is to explore the repo to verify their assertions.

00:04:35.595 --> 00:05:05.315
Number three is basically to interview the user relentlessly. So just a copy of the grill me skill again. Next, we sketch out the major modules you will need to build or modify to complete the implementation. We're gonna look at this later because it links to skills I'm gonna show you in a bit in this video. And finally, once you have a complete understanding of the problem and the solution, use the template below to write the PRD, and the PRD should be submitted as a GitHub issue. The way that my dev flow works is I take these PRDs in GitHub, I turn them into more GitHub issues that reference the parent PRD,

00:05:05.555 --> 00:05:40.655
and then I have a Ralph loop that just loops over each issue until it's done. If we go back to the conversation where we were before, we can see that it created this PRD here. This was four days ago as you can see. We've got a problem statement. The article writing page currently regenerates the entire document on every AI interaction. And the solution was to add a split pane document editing experience to the article writer. Chat stays on the left, a new document panel blah blah blah. So this is a big feature. We're adding document editing to a kind of AI chat feature. The important thing here is the user stories. There are many, many user stories as part of this. And this comes from agile methodology, and we're basically trying to describe the

00:05:41.000 --> 00:06:14.020
kind of desired behavior of our system in language, which is not an easy thing to do. I still haven't properly, like, landed on the right format for these. This is just something I sort of like. But you could easily use, like, Cucumber language for these or whatever you're kind of used to do, used to working with. We then zoom down to the bottom, and we just sort of pass in some implementation decisions. The implementation decisions here, we don't want to be, like, over prescriptive because we want these to be durable. Because if the code ends up getting out of date with the PRD, then we're gonna have issues when we actually go to implement it. But you can see the theory here. This is the kind of

00:06:14.820 --> 00:06:48.650
it's a really good description of the destination that we're going to. But what we don't have from the PRD is the actual journey, is the is the way we're gonna get to this destination. And if we lead back to that conversation, this is where I use my next one, which is PRD to issues. What this does is it takes a PRD, takes the destination, and it turns it into a Kanban board of different issues that can be independently grabbed. So the first step in here is it locates the PRD. If the PRD is not already in your context window, fetch it with this instruction. Explore the code base if you need to. And then draft

00:06:48.650 --> 00:06:49.770
vertical slices.

00:06:50.185 --> 00:07:09.640
It's not always clear how you should break a PRD down into individual tasks. This is something that developers have been doing for yonks. Right? And we've developed a kind of intuition for how to do it. In my opinion, the best way to do it is to break it into tasks that flush out the unknown unknowns really quickly. For instance, if you're integrating with a new kind of service or integrating

00:07:09.640 --> 00:07:11.880
two things which you haven't integrated before,

00:07:12.120 --> 00:07:25.535
then you should do that work first because it's gonna give you feedback on whether your approach is even valid. The right analogy here is the tracer bullet analogy. I won't go into what that means, but basically each issue is a thin vertical slice that cuts through all integration layers,

00:07:25.775 --> 00:08:38.200
not a horizontal slice of one layer. In the conversation, it broke down that really complicated PRD into just four slices. It first created a kind of engine with some tests applied to it. This is actually quite a good vertical slice because this was the engine that was going to then power the rest of the kind of setup. If this engine wasn't working for whatever reason or it wasn't feasible, then we would need to flush that out quickly. And this is what this, um, breakdown does. The PRD two issues also establishes blocking relationships between the tasks. For instance, number two here is not actually blocked by anything, so it can be picked up independently to one. This is really useful if you have a parallel agent setup where you can actually fire two agents at it at once, for instance, in, like, background tasks. And it also means that in the future, you can add other issues to this, like, uh, QA issues that you find or things that need to be improved, and you can then establish blocking relationships between that and the other things. We can see that number three here is blocked by one, the editing engine, and the number four, the Monaco editor toggle is blocked by number two. So I said yes to all of these, and it created then all of these GitHub issues. These issues reference the parent p r d so that the local agent can fetch it and view it. And it sort of just breaks down what to build really. And crucially,

00:08:38.280 --> 00:09:04.170
it references the previous user stories in the p r d. We can then see a comment actually from Claude code that ended up implementing this. It said a pure function document editing engine with 28 tests covering all acceptance criteria. And we can then take a look at the commit that references this issue. So this was basically my Ralph loop came and just implemented this based on the issue, commented on it, closed it, and then the next issue was unblocked. So so far, the grill me skill can help you flesh out an idea.

00:09:04.410 --> 00:09:27.435
The write a PRD skill can help you take that idea and turn it into a document. And then the PRD ish or PRD two issues skill helps you then turn that destination document into an actual journey. But then how do you actually execute on that skill? How do you make it like how do you make the implementation really rock solid and increase the code quality of what gets produced? We have got a TDD

00:09:27.435 --> 00:10:20.175
skill. TDD means test driven development. And when you invoke this skill, it basically forces the agent or encourages the agent rather to follow a red green refactor loop. Unusually for my skills, there is actually a lot in here. So it's not just the skill itself. It's also, uh, ideas on refactoring, on mocking, on what deep modules are. Doing really really good TDD has been the most consistent way that I've improved agents outputs. So let's have a look at what's actually in here. What we can see is I'll just skip over the philosophy stuff. I'll let you guys read that. We are basically looking at this workflow. Yeah. Now the first one here is really important. Confirm with the user what interface changes are needed. Now I made a video on interfaces and implementation recently, but let me just give you the pricey. When an AI looks at a bad code base, it will look at or it will see something like this where it has a ton of tiny modules here that are kind of undifferentiated.

00:10:20.175 --> 00:11:00.800
They're not really grouped together. It doesn't really understand how these things relate. And And so it has to do a lot of work kinda working out, okay, what's responsible for what? What are the dependencies? How does this actually how does the code base even function? Whereas if you restructure this into several larger modules with just kind of thin interfaces on top, the interface being the functions that are actually exported from this, the, uh, things that the callers actually call, then it's a lot easier for AI to navigate this code base, and it's a lot easier to work out how to test these modules because you just test them at their interfaces. You test them at their boundaries. You can check out the whole video on that below. So what this TDD skill is encouraging here is basically trying to make these interface changes

00:11:01.120 --> 00:11:30.570
really top of mind for the AI to get it to understand that when it changes an interface, that's an important decision it needs to take time over. You confirm with the user which behaviors to test. You design the interfaces for testability linking to a doc. And then we have some more stuff around planning here. It then goes into a lovely loop where it writes one test at a time and it writes the test first. Now I've talked about red green refactor before, so I'll link the video below if you're interested. But I found that red green refactor with agents is incredible.

00:11:30.570 --> 00:11:41.805
And it basically does this loop until it's complete. It just writes a failing test, then writes the code to make that test pass. And finally, it goes through and looks for refactor candidates. I haven't found that this is amazing.

00:11:41.805 --> 00:12:44.715
It hasn't been brilliant because often LLMs are quite, uh, no. They're quite reluctant to refactor their own code. If you were to clear the context of the LLM, then it would just sort of wipe its own memory, and it would be a lot less precious about the code that it's just written. But while its own code is sitting in its own context window, it's quite reluctant to change it. So this TDD skill is what I prompt my Ralph loops with in order to get them to do red green refactor. Now TDD demands a lot of you or rather it demands a lot of your code base. TDD is really hard to do in a badly structured code base because the test boundaries of this are really unclear. Should it just sort of test these modules on their own? Should it test these modules on their own? What are the boundaries here? Whereas when your code base looks more like this, then it's a lot easier to test because the module boundaries are really clear. So wouldn't it be great if there was a skill that made your code base look more like this? Well, isn't it nice? We've got an improved code base architecture skill. The process for this one is that we explore the code base and explore it kind of like naturally as an agent would. We're trying to find confusions.

00:12:44.715 --> 00:13:15.385
We're not like we're trying to sort of surface naturally what the AI finds confusing so that it can then sort of, like, help it out later. Where does understanding one concept require bouncing around between many small files? Where have pure functions been extracted just for testability but the real bugs hide in how they're called? Where do tightly coupled modules create integration risk in the seams between them? All of these are questions that a senior engineer would be asking about your code base. Number two is you present candidates. So you present a numbered list of deepening opportunities.

00:13:15.385 --> 00:13:22.500
In other words, opportunities to deepen shallow modules in your code base into deeper ones. The user then picks a candidate,

00:13:22.660 --> 00:13:25.300
and then you design multiple interfaces.

00:13:25.300 --> 00:13:28.340
So it says to spawn three sub agents in parallel,

00:13:28.755 --> 00:13:51.530
each of which must produce a radically different interface for the deepened module. In other words, we're extracting that code and designing possible ways that it could look in the future. Designing it in multiple different ways is a really great way that you can then decide on the right idea. I've seen this agent spawn like five different sub agents for a really big refactor. The coolest thing about this is you don't need to know a lot about interface design in order to get this working.

00:13:51.850 --> 00:14:02.685
After comparing, give them your recommendation which design you think is strongest and why. And if elements from different designs would combine well, then propose a hybrid. Notice that I've made this really language agnostic,

00:14:02.765 --> 00:14:13.710
really kind of sort of everything agnostic, really. You can just run this in any code base and just get a decent answer for how it could be improved. There might be four or five candidates that really could use some work.

00:14:13.950 --> 00:14:49.680
But really, I think you should only be sort of doing one of these at a time because they really are quite hard to get your head around. And they require a human in the loop to sit with them and improve the code base because these decisions do require taste. Finally, it creates a GitHub issue. So it creates a refactor RFC as a GitHub issue using g h issue create. Usually, once this is done, I will then go with my PRD to issues, uh, skill, reference that GitHub issue that's just been created and get it to, you know, this describes the destination. We then need a journey to get there. So just doing this every so often in a code base, you know, once a week just to identify opportunities.

00:14:49.760 --> 00:15:23.180
Or if you have a sudden surge of development and you kind of create a whole sort of extra wing of features, then this skill will be really really useful in just making sure it conforms to the rest of the code base, making sure that it's not too sloppy. And as you keep running this, as you keep refining your code base, you're gonna notice the quality of the agent's output goes up. Because the old adage really does apply. If you have a garbage code base, then the AI is gonna produce garbage within that code base. Because to be honest, if you took all of these skills and just said, okay, this is like a little mini markdown book of processes for humans,

00:15:23.420 --> 00:16:03.505
then it wouldn't look out of place. I found that the most successful way to get code quality up from agents is just to treat them like humans. Humans with weird constraints. Sure. Humans that, uh, have no memory and are just sort of cloned come out of the birthing pod and go right to work. But if you like me think these real engineering skills are super important, then this course is absolutely for you. What I noticed while I was creating the course is that I'm really not teaching Claude code that much. I'm teaching kind of what are sub agents. I'm talking about the constraints of LLMs, the sort of weird smart zone dumb zone stuff with a context window. We're talking about steering, which is essentially just a way of documenting stuff inside your code base. How to tackle massive tasks,

00:16:03.665 --> 00:16:41.327
understanding tracer bullets, and building those into our skills. Understanding how to build really great feedback loops and doing exercises with them and crucially how to hook these up to an autonomous agent. Every part of this course just sort of like leads onto the other and I'm super happy with how it turned out. So over the course of two weeks, you'll be working through that self paced material with me as your guide in Discord and on live office hours. And if that sounds fun to you, then the link is below. Thanks for watching folks. I'll be coming back with a lot more stuff this week. What would you like me to cover next? I find the intersection between this real engineering and AI is like it's such a awesome place to make content about. But anyway, thanks for watching, and I'll see you in the next one.
