WEBVTT

00:00:00.000 --> 00:00:03.840
A few weeks ago, I noticed myself doing something with agents

00:00:04.000 --> 00:00:16.285
that I thought was very clever, but I thought it was just too simple to require a skill. For those who don't know, I'm constantly thinking about skills. I'm constantly thinking about how to package my instincts

00:00:16.285 --> 00:00:41.030
and coding practices into reusable skills. And this has meant my skills repo has almost a 100,000 stars at the time of recording. The skill that I started to think about was a handoff skill. And the theory was that this skill would take the context window of the current session and compress it down into a markdown file that could be handed off to another session. And so a couple of weeks ago, I shipped this. It's inside skills, inside productivity,

00:00:41.485 --> 00:01:11.775
and it's inside handoff here. And it's a very, very simple skill. It says to write a handoff document summarizing the current conversation so a fresh agent can continue the work. Save it to the temporary directory of the user's operating system, not the current workspace. I put this into my skills folder as an experiment to see how much I would use it, and it turns out I used it a lot. In this video, I'm gonna show you a deep dive of the skill, kind of why I designed it, what is the point of it, how it compares to built in tools in some of these harnesses like compact,

00:01:12.175 --> 00:01:30.630
and also how you can get the most out of it to make the most of your grilling sessions. And if you dig the kind of stuff I've been showing you, then you will love the course that I've put together which is AI coding for real engineers. A two week cohort for folks who want to use AI coding tools for shipping quality code, not slop.

00:01:30.870 --> 00:02:06.625
It starts on June 1. We're doing a discount right now. Get into the link below so you can check out. Let's start first of all by explaining why I made this skill and how it differs from compaction which you may have heard of before. When we're inside a session like this, uh, coding session, we essentially as we, you know, converse with the agent, as it does tool calls, as it makes file edits, then this context window is gonna be filled up and filled up with more and more stuff in it. More and more tokens will fill up the context window. Now in the harness I use, Claude code, it's the context window is huge. Right? You get 1,000,000 tokens worth of context window.

00:02:06.865 --> 00:02:18.910
But there is actually a smart zone and a dumb zone in these context windows. Early on in the context window, you are gonna get much better performance from the agent because the attention relationships

00:02:18.910 --> 00:02:22.750
are not so strained there. Because there's much fewer

00:02:22.750 --> 00:02:24.190
tokens to calculate,

00:02:24.270 --> 00:02:43.885
fewer attention relationships between those tokens, then the agent's attention isn't so diffuse. In other words, it's better able to focus when there's less content in there. This means that as your conversation develops, you're going to get dumber and dumber and dumber responses from the agent all the way up to going up to, you know, 800,000

00:02:43.885 --> 00:02:49.320
tokens, which personally I've never been in because around by the 120

00:02:49.320 --> 00:02:57.640
token mark, I start to feel like I'm in the dumb zone. So this means, yes, that even though Anthropic advertises a ton of context window on these models,

00:02:58.120 --> 00:03:02.695
really for, you know, proper smart tasks, you've only got about a 120

00:03:02.695 --> 00:03:22.510
to work with, which means you need to budget really efficiently and you need to be aware of your context window at all times. So the question then becomes, what do you do when you're starting to hit up against this dumb zone? How do you recover your conversation? How do you continue the conversation beyond the dumb zone while staying smart? And the answer to that is compact.

00:03:22.750 --> 00:03:32.045
What compact does is it will take a large conversation like this and summarize it. So you go essentially from near to the dumb zone to all

00:03:32.045 --> 00:03:34.205
the way into the smart zone here.

00:03:34.525 --> 00:03:47.570
And there's even sometimes an auto compact buffer depending on what harness you're using and whether you've got it turned on, Which means that when you're near to the end of the context window, let's say deep in the dumb zone, the auto compact buffer will kick in and automatically

00:03:47.570 --> 00:03:49.250
summarize your conversation

00:03:49.490 --> 00:03:50.370
inside

00:03:50.370 --> 00:04:00.315
a new session. This summary usually looks like the files reference, so just a list of files that have been referenced. The things that you said in the conversation are usually included,

00:04:00.475 --> 00:04:19.590
and the general tone of the conversation as well. This is then included as a little nugget at the start of the new session. And as you build up context in the new session, then you're continually referencing the old session. This means as you continue to compact and compact, you're gonna end up with this kind of sediment of different layers here from previous conversations.

00:04:20.085 --> 00:04:40.840
And this can be a little bit inefficient, but it's also a decent way if you want to do certain types of sessions where you just need to barrel on on the same problem again and again and again. It can be really useful for debugging actually because you can compact all of the other options that you've tried and then continue to try different things, hit the barrier,

00:04:40.920 --> 00:04:47.315
and then compact again to just save your state essentially. So it's a way of doing a long running session,

00:04:47.555 --> 00:04:55.235
but it's only really one session. So I continue to find compact a really really useful tool for creating these long single sessions.

00:04:55.475 --> 00:04:59.235
But what I started to notice was I wanted to do other things with compact.

00:04:59.680 --> 00:05:09.760
I wanted to compact into another session. For instance, let's say I was in one session here and while I was in this session, I noticed a little refactoring opportunity.

00:05:10.000 --> 00:05:22.805
Something that was totally out of bounds, out of scope from my current session, but I knew I would need to get there eventually. So what were my choices? I could extend my current session, but then I would end up with this sort of like diluted context

00:05:22.805 --> 00:05:31.220
where I was half working on one thing, half working on the other, and I would definitely hit the dumb zone. Right? So I probably wouldn't be able to finish my initial goal.

00:05:31.620 --> 00:05:32.900
I could compact,

00:05:32.900 --> 00:06:22.905
but then I would clobber all of the progress that I'd made in my current session. Right? What I really wanted to do was just say, okay, I wanna complete this other thing in a separate session and keep my current session pure. In other words, this was what I wanted. I wanted to essentially take the context or take just the slice that pertain to this extra bug fix, hand it off to another session, and then these two could just run independently. And so for a while, what I was doing was saying, okay, take the stuff in my current session. I want to fix this particular bug. Write me a hand off dot m d document so that I can then just pass that into another agent. And it turned out I was doing this so freaking often that I just decided, okay, I need a skill for this. I most often use hand off while I'm grilling here. Here, I'm inside a grilling session that I did for planning some future features for Sandcastle,

00:06:22.905 --> 00:06:24.345
which is my sort of software factory.

00:06:24.840 --> 00:06:31.400
And what you can see here is that I'm kind of answering some questions. I'm only in q two of this grilling session, so not a long one.

00:06:31.640 --> 00:06:37.080
And I say here, I think in future, we may want to move the iterations and the completion signal onto a separate API.

00:06:37.495 --> 00:06:41.175
In fact, let's hand off that task to a separate agent.

00:06:41.335 --> 00:06:46.535
You can see here that when I'm defining hand off, what I'm saying, I'm saying the reason

00:06:46.615 --> 00:06:58.080
why I'm handing off and exactly what should be in that document. This does two things. First of all, it actually sharpens the current grilling session I'm on. So it says that given that constraint q two collapses.

00:06:58.080 --> 00:07:27.490
So it doesn't actually like, it helps my current grilling session because I'm saying that's out of scope. We'll pick that up somewhere else. It then goes and creates a markdown file just here with the focus for the next session, file a GitHub issue, and eventually design for splitting iterations in the completion signal into a separate API. And then later, I just pass this into a another agent in order to create the issue. Simple. Another pattern that I really strongly recommend is handing off during a grilling session to prototype.

00:07:27.810 --> 00:07:39.735
When you're grilling, when the agent is asking you questions from a grill me or grill with docs, which are more of my skills, you will often find those two categories of questions you need to answer. There are the kind of known unknowns,

00:07:39.735 --> 00:07:46.295
the ones that the agent can ask you about. And then there's stuff that you really need to see in code or need to see prototyped.

00:07:46.670 --> 00:07:49.950
This can be really true with like UI prototypes

00:07:49.950 --> 00:07:59.790
or complicated bits of logic that you're not quite sure how to deal with yet. So in this grilling session, we're down to question 13 actually, and we've got a sort of final, uh, resolution

00:07:59.790 --> 00:08:00.430
from the agent.

00:08:00.945 --> 00:08:18.620
And then we can see I say hand off to prototype the difficult bits here. The window communication, the TL draw SDK integration, which was something I was building at the time. It creates the hand off and then I go and implement the prototype on that branch. So in the prototype session, this ended up being a huge session. So a 169

00:08:18.620 --> 00:08:19.660
k tokens.

00:08:19.740 --> 00:09:07.685
So way bigger than would have fit inside the grilling. And what I did was I created this prototype of the UI and the kind of interaction that I wanted to see. And then I said, okay, let's hand this off back to the grilling session that spawned this. Take all of the learnings from the prototype, anything that's not directly captured in the prototype itself or that's non obvious, give me a hand off document that I can pass back to the planner. This is actually a really common pattern that I'm using here where you have the initial session where you do some work, you hand off to another session. That session then creates another hand off document and then passes it back to the original session. It's almost like you've done a kind of DIY sub agent where you're able to use a context window for one specific task, compress your learnings from that task, and pass it back to the parent.

00:09:07.925 --> 00:09:16.330
Then I was able to finish the grilling session and create some proper PRDs and issues with the prototype in there. So it's incredibly

00:09:16.330 --> 00:09:17.770
rich pattern for

00:09:18.090 --> 00:09:46.290
actually getting what you need out of AFK agents and using prototypes. It's very very cool. It's worth saying too that the thing that's cool about just using like a markdown document here and not relying on kind of native agent stuff is that you can have this first session be Claude code, but you can just pass this to another agent. Right? You can pass it to codecs or pass it to, you know, Copilot CLI, whatever you're using. So if you want to do any kind of adversarial review or any kind of,

00:09:46.930 --> 00:09:56.265
you know, interaction between different coding agents, this is a very, simple way to do it. We should also just read through the final bits of the skill here just so you understand the reasoning behind everything.

00:09:56.665 --> 00:10:02.745
The theory here is include a suggested skill section in the documents which suggest skills that the agent should invoke.

00:10:02.985 --> 00:10:04.745
I added this because

00:10:04.745 --> 00:10:06.185
sometimes it would

00:10:07.020 --> 00:10:32.035
I use skills to kinda define the flavor of that session. And so having a suggested skill section means that you can kind of just paste the hand off document into the new session. It will invoke the skills needed like grill with docs or diagnose or prototype or something. And then you're kind of good to go. So you don't need to think about the skills that you need to use in the next session. It's pretty handy. Another one is do not duplicate content already captured in other artifacts.

00:10:32.115 --> 00:10:40.080
I would often find these hand off documents just got really big, and they were just duplicating stuff that was already present either in other markdown files

00:10:40.160 --> 00:10:57.355
or in resources like GitHub issues or things like that. So it's basically saying just use pointers instead of, um, you know, repeating everything that's in the documents. I also really strongly believe that you should save these hand off files to the temporary directory of the user's OS. In other words, these hand off files are disposable.

00:10:57.595 --> 00:11:07.720
They are not something to be kept around for a long time to rot in your code base as documentation. Another one is redact any sensitive information, API keys, passwords, or PII.

00:11:07.880 --> 00:11:21.535
This is, you know, pretty essential. You don't want these floating around in markdown files in just random places. And finally, if the user passed arguments, in other words, what the next session will be used for, treat those as a description as to what the next session will focus on and tailor the doc accordingly.

00:11:21.615 --> 00:11:26.815
I think of this as essential for handoff because in order to write a decent document,

00:11:27.055 --> 00:11:34.860
the agent needs to know what the next agent session is going to focus on. Every time I use handoff, I always describe the purpose,

00:11:35.100 --> 00:11:58.440
the reason that we're handing off because I just can't see how you would write a good handoff document otherwise. And of course, dictation makes this really easy because I just blast it out and then we're good to go. So there we go. That's hand off. This is an essential skill in my toolkit that you know just like a lot of my other skills didn't exist but a few weeks ago. If you've been enjoying my skills then you should check out the cohort course. It is an absolute banger. We had about 2,500

00:11:58.440 --> 00:12:15.355
people take it last time and I'm expecting, you know, a decent whack this time too. Other than that, thank you so much for watching. My bookshelf behind me is filling up with new coding books that I'm gonna be reading over the next couple of I'm thinking about maybe making a sort of what's on my bookshelf video of recommended books.

00:12:15.595 --> 00:12:23.275
And I don't know, if you like that, then maybe give us a like and a comment or let me know what you want to see next. Either way, thanks for watching and I'll see you very soon.