WEBVTT

00:00:00.000 --> 00:00:29.650
Right now, ClawCode's memory system is still way behind a lot of what the open source community has already figured out. So in a recent video, I broke down these seven levels of clawed memory systems. And whilst researching that, I ended up digging through some really advanced setups that people are building right now. Setups like the Hermes agent, MEMSearch, and a bunch of others. And to my own surprise, a lot of these systems looked incredibly advanced, but the core ideas underneath them are actually very simple to replicate. So underneath all the complexity,

00:00:29.890 --> 00:00:35.090
it always comes down to just two questions. When and how does information get written to memory,

00:00:35.490 --> 00:00:42.095
and when and how does it get retrieved again? So in this video, I'm gonna show you what ClawCode's memory looks like today,

00:00:42.415 --> 00:00:44.815
what the newest systems are actually doing differently,

00:00:45.055 --> 00:00:52.815
and then the setup I'd actually recommend if you want ClawCode to stop forgetting things. And one thing upfront, this isn't about loading more context

00:00:53.000 --> 00:00:54.200
into ClawCode.

00:00:54.280 --> 00:00:59.000
It's about keeping context lean, only retrieving the right information when it's actually needed.

00:00:59.240 --> 00:01:22.060
So let's get into it, and we can start off by talking about the three questions that every memory system has got to answer. So firstly, it's all about storage. How does information actually get saved and at what point? So what happens when somebody says something to Claude that's worth remembering? How does that actually get stored in the system? So you might say our landing page is school.com/scrapes,

00:01:22.220 --> 00:01:48.425
and you want Claude to always remember that information. So in some way, we want the agent to actually go away and save that, and we want that to be consistent and reliable. Or a decision like we're using Stripe, not PayPal, same thing. You want that to be saved into the memory and then retrieved at a later date. So we wanna understand how this information gets saved with all these different memory systems. Then we wanna understand how information gets injected. So you're probably familiar that the claude.md

00:01:48.425 --> 00:01:57.460
file gets injected into the system prompt whenever we prompt Claude, so it's injected every single time. So how do we actually take important context of recent memory

00:01:57.700 --> 00:02:08.925
and push it to the agent during our conversation so that next time you do start a session, you can open Claude code and the memory of the most recent or most important information is loaded in automatically.

00:02:09.085 --> 00:02:14.685
But it's only a snippet of that information. It's not tens of thousands of tokens. We have a small curated

00:02:14.890 --> 00:02:26.170
always there set of memory that's pushed in. So that, for example, Claude already knows your landing page URL or already knows your Stripe decision because we made that, and that's an important decision. So we've got storage and injection.

00:02:26.490 --> 00:02:31.575
But then more importantly for long term memories, how do we actually go and find and recover

00:02:31.655 --> 00:02:57.470
past information that we've told it? Information that we told it about client x six months ago. That's the information that we need to be able to recall. And this could be as recent as last week or it could be, you know, several years ago or months ago. So we might ask, what did we decide about pricing last Tuesday? And it might have a step by step process of let's check what's been loaded in the injection phase. If not, let's go deeper. And if not, let's go even deeper. And we need a framework to actually store

00:02:57.835 --> 00:03:04.475
and retrieve that information from the long term memory. So how does it store? How does it inject? And how does it recall?

00:03:04.475 --> 00:03:12.550
So these are the three themes that we're gonna follow through this video and talk about the different systems like ClawCode out the box, Hermes, and MemSearch,

00:03:12.550 --> 00:03:17.590
which are two of the best systems that I've found on the market. They often take completely different approaches.

00:03:17.750 --> 00:03:24.615
So let's get into the first section, which is all about storage. So when you have a conversation with Claude, it's actually auto detecting certain

00:03:24.615 --> 00:03:28.775
things you say in the background and writing them silently to dot m d files.

00:03:29.015 --> 00:03:42.550
These are stored at a per project level in the global space. So we've got the dot Claude project slash projects, and then we're storing memory folders back there. We then have a memory dot m d index, which is updated with all the files

00:03:42.710 --> 00:04:23.765
for which it can point to. So when you have a conversation in the future with Claude, it can always reference those files. Now this is on a per project basis, but if you repeat things multiple times and you have certain things, certain preferences that are done three or more times, then it gets promoted to a global dot Claude slash memory folder. And you can actually see this if you go directly into your Claude code terminal and do slash memory. It will say, do you want to look at your user memory, which is saved in the claw dot m d? Do wanna look at the project memory, which is also m d? Or do you want to open the auto memory folder? So if you open that auto memory folder, then you can actually go and see all of the files and the index of files that that's created, and you can see that those actually point

00:04:23.925 --> 00:04:29.445
to each other. So these are happening automatically in the background, and I wouldn't say they're very comprehensive.

00:04:29.765 --> 00:04:48.470
It's kind of mostly if you're telling it this is a really important thing, but, otherwise, it's not really gonna store a huge amount of information. Now let's look at what the open source community has figured out around this. So how do they store and capture information as you go through? So MemSearch uses a Claude code stop hook. So it's gonna fire after every turn,

00:04:48.870 --> 00:04:52.445
not just the memory worthy turns. So it's gonna call Haiku,

00:04:52.685 --> 00:05:07.790
which is gonna summarize each turn into bullets. And it uses Haiku because it's a cheap, fast model, and it's doing it all the time. It's gonna append that data to a memory slash date file with session anchors. So, you know, when you close a session and you have a specific

00:05:07.950 --> 00:05:09.150
session ID,

00:05:09.230 --> 00:05:33.045
it's gonna append that or the notes from that session to a specific memory file. So it's storing literally everything. It then periodically runs MEMS search index, or you can run this manually. Each bit of information gets chunked into a hash. Now the reason it's converting that information into a hash is because it can then embed those chunks and turn them into vectors. Those vectors are then stored in a MILVUS

00:05:33.220 --> 00:05:34.580
vector database,

00:05:34.900 --> 00:05:52.775
and it's all done locally on your CPU. So there's zero API cost. And what this actually means for you, it's not very relevant in terms of what it's being stored as. It's being stored as vectors, so literally a sequence of numbers. But what it does is store really effectively a meaning and a bunch of metadata associated

00:05:52.775 --> 00:05:59.015
with that specific memory. This is great for the retrieval stage later because it means we can actually retrieve information

00:05:59.255 --> 00:06:04.200
by meaning instead of just by keyword search. So not only do we have the markdown files,

00:06:04.440 --> 00:06:25.525
everything is also indexed and vectorized and put into a database in the back end automatically for us. That is absolutely critical for the retrieval stage later. And what's great about this is it basically treats markdown as the source of truth. So everything is appended as markdown, and then everything else is rebuildable later from those markdown files. So if you lost this database,

00:06:25.685 --> 00:06:39.470
you could actually rebuild it from all the memories that have been appended to that date. And the other good thing about it or good and bad, you could say, is it captures everything. So it's not just what auto memory from ClawCode thinks is the most relevant thing. It's actually gonna capture

00:06:39.630 --> 00:06:55.835
absolutely everything. Now you might wonder, is that overkill? Well, we can come to what Hermes does in a minute and decide for yourself whether that is overkill because Hermes actually takes a completely different approach. And it's closer to what ClawCode is doing out the box because actually the agent is deciding

00:06:56.155 --> 00:07:21.015
what to save. The agent has access to a couple of tools inside Hermes, so add, replace, or remove. And what it's doing is adding those to a memory dot m d file and a user dot m d file. So similar to what you've seen probably in OpenCLORE or if you've set up your own Agencik OS, you might have a memory dot m d and a user dot m d, But this isn't the same as Claude's memory dot m d. This is a memory dot m d with a cap on the number of characters

00:07:21.175 --> 00:07:28.220
that retains the most important information, and we'll talk about how it does that. So memory dot m d stores environment information,

00:07:28.300 --> 00:07:39.100
things you've done, and then user dot m d is all about user profile. So anything you say about the way you work or the way that you want to operate, user dot m d stores. It also has mechanisms in there for deduplicating.

00:07:39.100 --> 00:07:44.035
So whenever the agent thinks it's gonna add, replace, or remove something important,

00:07:44.275 --> 00:07:48.115
it will also check for duplicates and make sure that it's not writing

00:07:48.195 --> 00:07:59.690
duplicate information to our valuable memory space. Now all of these are kind of useless unless the information gets injected at some point, which we'll talk about next. But the important thing to know is these caps on characters

00:07:59.930 --> 00:08:00.810
enforce

00:08:00.810 --> 00:08:01.850
consolidation.

00:08:01.850 --> 00:08:04.410
So where MemSearch captures absolutely everything,

00:08:04.650 --> 00:08:08.410
the point in the Hermes memory logic is that it enforces that consolidation

00:08:08.865 --> 00:08:21.105
for when it injects that context later on. But in some ways, it is very similar to MemSearch because every turn, it also auto saves the complete raw transcript to a database in the background. And it uses a curator.

00:08:21.700 --> 00:08:25.700
So every seven days, it goes through and prunes and consolidates

00:08:25.700 --> 00:08:35.940
all of the information that we've just talked about. So the curator's job is to keep everything clean. What it does is remove the raw transcripts from that information. So whilst MemSearch stores exact

00:08:35.940 --> 00:08:36.820
raw transcripts,

00:08:37.175 --> 00:08:42.055
Hermes actually consolidates and prunes that information. So they're actually both excellent,

00:08:42.135 --> 00:08:58.880
especially when you compare it to claw code. And if you look in your own memory dot m d with the auto memory, it barely saves a thing. So MemSearch and Hermes go 10 x further than the basic claw code out the box. So which one would I actually recommend that you use in this approach? Well, MemSearch captures

00:08:58.960 --> 00:09:02.960
everything automatically with that stop hook, but it's raw and uncurated.

00:09:03.625 --> 00:09:06.745
Hermes is gonna capture our curated facts,

00:09:06.985 --> 00:09:18.825
especially those that are gonna be put into memory dot m d and user dot m d, which is lean and intentionally lean. But if the agent doesn't think to save something, it's kinda like with our Claude auto memory, it's still actually grabbing

00:09:19.310 --> 00:09:47.905
the full transcript and saving it into something that we can retrieve from a database at a later point. So my answer to which one should you actually use, I actually think we should combine the logic of both here. We should use automatic capture for completeness and then curated facts for what matters most because this is really important for the injection of the context phase. So take the best of both and combine it so we've got a long term search from this embedded vector database that we can search by meaning, but also the power of

00:09:48.280 --> 00:09:50.280
choosing specific information

00:09:50.520 --> 00:09:57.400
to store in the memory dot m d and user dot m d. So now that we come to the injection phase, we can actually push that information

00:09:57.560 --> 00:10:02.200
into our context without having to search through a load of raw uncurated

00:10:02.280 --> 00:10:08.975
transcripts in the background. So memory injection into the context window is quite misunderstood.

00:10:08.975 --> 00:10:15.055
It's not about loading more context in. Like we always talk about, it's loading the right context

00:10:15.215 --> 00:10:26.260
at the right time only. So the default behavior of Claude code is when you start a session, you inject the full Claude dot m d, and that's why we wanna keep the Claude dot m d ideally

00:10:26.340 --> 00:10:38.195
under 200 lines. That goes in with the system prompt. And then before you use a tool or before Claude uses a tool, there is actually a pre tool use hook which grabs the memory dot m d index,

00:10:38.675 --> 00:11:08.845
looks through those list of memory files that were stored earlier, and decides does it need based on your your query to actually go and research one of those memory files and inject that into the context too. If it does, it will inject that in as additional context inside the conversation. So this is a pretty decent starting point, but actually we can learn a lot from the way Hermes does this. We already saw that it captured a user dot m d and memory dot m d file with more information that's periodically updated

00:11:08.925 --> 00:11:09.885
and consolidated.

00:11:10.230 --> 00:11:22.150
We can actually inject those into the context window. But first let's quickly cover memsearch because it might surprise you here but memsearch actually has no injection layer at all. It just relies on the default behavior

00:11:22.565 --> 00:11:35.925
of Clog code injection the Clog. Md and the memory. Md. MemSearch is really built for the recall which we'll come to. So think of MemSearch as storage and search basically, a storage and search library that massively improves long term recall.

00:11:36.380 --> 00:11:56.455
Whereas Hermes I think nails this. So at the session start it basically loads a frozen snapshot similar to the way that Claude uses Claude. Md but it will not only use the Claude. Md, it will additionally add in the memory. Md, the user dot m d, and soul dot md every single time. And that comes to around 1,300

00:11:56.455 --> 00:12:06.455
tokens that are put into every single conversation window. Now this is per session because it's a frozen snapshot, so it gets cached in the memory. So you don't spend 1,300

00:12:06.455 --> 00:12:44.610
tokens every time you send a message. It's just at the start of a session conversation. The session ID will have that context save. So anything that's saved to memory dot m d, user dot m d, sold dot m d during the session will be written to the disc in the background and will not be loaded into that conversation, but will be loaded into the next conversation. So it's a really obvious choice for what logic we'd like to use for the actual injection layer and that's let's use ClawCode's behavior plus Hermes actual frozen snapshot to load in the memory dot m d, user dot m d, and sol dot m d, which as we saw in the storage stage consolidates recently

00:12:44.610 --> 00:12:46.930
biased and most important information

00:12:47.375 --> 00:12:53.215
inside these three folders or these three markdown files. Now, yes, you are loading in 1,300

00:12:53.215 --> 00:12:55.535
tokens every single session,

00:12:55.775 --> 00:12:58.335
but compared to the huge context windows,

00:12:58.575 --> 00:13:02.975
the increased performance you're gonna get from recent consolidated memories,

00:13:03.400 --> 00:13:08.440
in my opinion, is worth it. Now this is where stuff gets really interesting in recall

00:13:08.520 --> 00:13:16.920
because this is probably the biggest gap that ClaudeCode has out the box. Most of the time, we're not working just on a task by task basis with ClaudeCode.

00:13:17.000 --> 00:13:22.455
We have a bunch of clients. We have a bunch of projects on the go. And actually storing that information

00:13:22.695 --> 00:13:35.570
is critical. But recall is the most important thing. If you can store as much information as you want, but if you can't get it out at the right time, then it's not worth having a good storage mechanism in the first place. And ClawCode out the box has a really poor,

00:13:35.730 --> 00:13:40.610
dare I say it, recall system. So basically it's user asked about the past,

00:13:40.770 --> 00:13:53.325
some question about the past, It's gonna check the auto memory files which we've already seen. And if it's not been saved in there, it's completely lost. You might have opened the memory files that you had from earlier inside your project repository.

00:13:53.405 --> 00:14:08.480
It really is quite selective about what it saves. You probably don't have a huge amount of information stored there. So actually recalling past conversations and information is gonna have to just go and trawl through previous conversations you've had and actually burn through a load of tokens trying to find relevant information,

00:14:08.720 --> 00:14:23.015
and it has no methodology for doing so right now. Now you can, of course, use the resume flag to actually resume a previous conversation, but you have to know which session you actually wanna resume to get that context back. So for ClawCode, the storage of information is okay.

00:14:23.415 --> 00:14:36.000
The injection is basic with just the ClawDot MD, but the recall is actually really weak and where we can benefit most from external systems. So how does that compare to MemSearch if a user were to ask about something from the past week,

00:14:36.240 --> 00:15:01.230
the past month, the past six months? Well, MemSearch has a really powerful three tier retrieval system that basically only goes deeper if it needs to. It works on the same principles of progressive disclosure. So user asks a question about the past and we're gonna use the MEMS search search query. It's basically going to convert your query into vectors so that you can go and find in the vector database where we stored the information earlier semantic matches

00:15:01.230 --> 00:15:07.470
for your queries. Then because it's stored as vectors, we'll also be able to find matches for monetization,

00:15:07.950 --> 00:15:19.675
revenue, price. So it doesn't have to be exact keyword matches like we're actually searching in the vector database by meaning here. And it even has a method to do that by keywords. So the dense vectors allow it to search by meaning.

00:15:20.075 --> 00:15:22.395
The BM 25 keywords

00:15:22.010 --> 00:15:35.735
allow it to actually keyword match and then it's basically summarized in one list of these are the closest matches to your relevant query that you asked about the past. Now it will pass that back to the agent first and if there's nothing that's totally relevant,

00:15:35.975 --> 00:16:11.605
then it's able to actually go one level deeper. So at that point, it could stop and actually find really relevant queries, find exactly what we're looking for from information in the past. If that answers the question, great. However, if that does not answer the question, then it jumps to tier two which is search expand. And MEMS search expand gives it more context, more metadata, a summary of information around the match that we potentially found. And, again, if that is not good enough and we need the raw dialogue, then it's gonna go to the next tier level three, which actually has all of the session dialogue that we had. Because if you remember, every single message we send,

00:16:11.925 --> 00:16:48.610
it's summarized into bullets and then appended to the memory and then that is indexed. So all of the raw dialogue is actually saved and we can retrieve that with level three if we need to as a last resort. Now all of these take more tokens as we go down, but if you need a reliable system for retrieving information about your client's project six months ago, then MemSearch is gonna be the one. Now you might have identified the limitation in this approach which is if we're asking about the past it immediately thinks okay instead of searching the local context let's go and do a database query. So that's gonna be slower than just checking our local

00:16:48.930 --> 00:17:05.225
in context memory. So Hermes uses a really clever approach for this. First instead of going deeper into the database it's actually just gonna check our memory. Md. That has the question that the user has asked been actually accessible via just the memory. Md,

00:17:05.305 --> 00:17:06.665
which means it can actually

00:17:06.905 --> 00:17:10.905
get it from the context that it's already received. So the power in

00:17:11.385 --> 00:17:13.145
injecting this frozen snapshot

00:17:13.280 --> 00:17:20.960
means that actually for some queries, it's gonna be able to be answered just from the context that's already in the memory.

00:17:21.120 --> 00:17:24.560
And that will basically be zero cost and instantaneously

00:17:24.560 --> 00:17:25.360
accessible.

00:17:25.360 --> 00:17:28.160
So it should, in theory, always search the context

00:17:28.405 --> 00:17:53.820
of that existing conversation before it goes down to the levels and searches the database. So if it is not found in there, then it goes deeper and searches the sessions. And we already mentioned those were stored in a database the same as we did for mem search. But instead of being a vector database, it's just searching by keywords effectively. So then what it's gonna do is basically return the top three matching set sessions by relevance

00:17:53.900 --> 00:18:19.810
and summarize it using Gemini Flash and pass that back into the agent. So Hermes is really good at exact keyword matching. So if we were to ask it about pricing, it could find things about pricing, but it might not necessarily find things about revenue because that's by meaning and not keywords. However, they do do one really smart thing which we're gonna adapt and use, which is inject this memory dot m d into the conversation history. And then also by default, as a level zero,

00:18:20.050 --> 00:18:34.855
check that memory dot m d. So check what's already in context before jumping down into the MEM search hybrid search, the MEMS search expand, and the level three down here. So what we'd actually ideally do is grab this memory dot m d, check,

00:18:35.255 --> 00:19:10.775
and put that into the MEMS search flow so that we have a hybrid of both of those. So we can treat this step as almost like a level zero between MemSearch and Hermes so that we actually check what's already in context before we go deeper and check the vector database. So the user asks about the past, it's gonna check the memory dot m d and the context that's in that existing window. And if not found, then it's gonna go on to the MEM search to start searching the vector database by keyword and meaning and then continue to level two and level three if it needs to do so. So that's a lot of information. Now how do you actually set this up for yourself and take the best elements

00:19:11.390 --> 00:19:20.350
of each system that can be worked together? So here's what I'd actually recommend when taking the best from each system. So let's run through store,

00:19:20.430 --> 00:20:05.635
inject, and recall and the life cycle of a conversation as it happens. So but we will, of course, leverage everything that's already built into code that works well as best practice. So as a conversation happens, we're gonna leverage the auto memory, which is built in and saves those memory dot m d files to the Claude global folder for us. But after every term completes, we're gonna add in the memsearch stop hook that's basically gonna capture word for word all of our transcripts of our conversations so that those can be put into a daily memory. But what we want to do is maintain a memory dot m d and a user dot m d file so that actually if the agent decides that something is important, it's not just relying on Claude code to add, replace or remove into memory. Md or user. Md files. Now that covers actually storing

00:20:05.875 --> 00:20:11.060
more context so that we can actually retrieve it later. We, of course, also leverage the vector database

00:20:11.060 --> 00:20:19.860
of MemSearch which is actually consolidating this information into long term semantically searchable memory. So basically we're gonna run a nightly job to consolidate

00:20:20.315 --> 00:20:45.000
all the information that were put into that database. All the transcripts, all the raw transcripts are gonna be consolidated using this memsearch index every single night. And if all of this is sounding a little bit too complex for you to actually go and set up, then I'm gonna show you later where we've got an exact guide for free on how to give this plan to Claude code and it will go through all your file systems and work out how to actually implement this and do all the installations for you. Now injection,

00:20:45.000 --> 00:20:57.465
we actually leveraged Hermes logic. So when the session starts, we wanna inject a little bit more context than just Claude dot m d. We wanna inject the sole dot m d, the user dot m d, memory dot m d, and then possibly today's log if

00:20:57.705 --> 00:21:15.850
you could also inject yesterday's log if you think that would be relevant too. So that would be 3,000 tokens that are cached at the start of every session, which will really be important when we come to actually recalling it. So then we jump onto the recall segment of the flow. And what we've done here is combine the tier zero of Hermes

00:21:16.105 --> 00:21:59.415
where we check the memory dot m d and daily log first. So those are injected inside the system prompt every time we send a message, but they're cached. So what we're doing is basically before digging deeper into the vector database to search past history, we check the local recent data that's been loaded into the conversation already. So memory dot m d and daily log, that has zero cost, and it's also pretty much immediate because it already has it in context. If that is not found, then we jump on to the MEMS search traditional level one, level two, level three, where we search the queries using the hybrid keyword and semantic or vector search. We then expand those with the chunks. And then if we do not find the information still, then we can actually pass the raw transcripts

00:21:59.180 --> 00:22:00.540
and passed that information

00:22:00.700 --> 00:22:09.100
back to the agents. So this setup gives us the ability to actually search information really quickly from local

00:22:09.180 --> 00:22:24.215
recent files and prioritize those, but also gives us the ability to actually search further back in less recent history to recall all our old knowledge to the point where we can literally pull out the raw dialogue at the end. The one thing I want you to take away is none of this is complicated individually

00:22:24.215 --> 00:22:26.375
but it's all about preserving best practice

00:22:26.810 --> 00:22:27.850
for storage,

00:22:27.850 --> 00:22:29.450
injection, and recall

00:22:29.530 --> 00:22:35.530
so we can massively improve the memory usage inside your crawl code sessions. If you're working on projects

00:22:35.530 --> 00:22:51.465
and multiple clients, then this is an absolute must have. And I know Anthropic are working on their own memory systems, but right now it's far, far behind what you can get from systems that are currently open source and free to access. Now I'll link below a completely free plan.md

00:22:51.465 --> 00:23:10.465
document for you to pass this into Claude and set it up for yourself. Now if you do want this straight out the box, done for you, you know it's gonna work well, then we'll be implementing this inside our own Agentic operating system next week. That's also linked down inside the academy in the description below. If you want to see what other options I considered for memory,

00:23:10.625 --> 00:23:12.065
out the next video. Thanks for watch
