Simon Scrapes · Youtube · 23:12

3 Claude Memory Systems to Get You Ahead of 99% of People

A 23-minute systems teardown comparing Claude Code default memory, MemSearch, and Hermes -- then synthesising the hybrid setup that beats all three.

Posted
May 16th 2026
yesterday
Duration
23:12
Format
Tutorial
educational
Channel
SS
Simon Scrapes
§ 01 · The Hook

The bait, then the rug-pull.

Claude Code default memory is losing information you cannot afford to lose. Simon Scrapes spent time digging through the most advanced open-source setups -- Hermes and MemSearch -- and found that the core ideas underneath all the complexity are actually simple. This is the teardown that shows you exactly what is missing and how to fix it.

§ · Stated Promise

What the video promised.

stated at 00:35 "I am gonna show you what Claude Code memory looks like today, what the newest systems are actually doing differently, and then the setup I would actually recommend if you want Claude Code to stop forgetting things." delivered at 19:04
§ · Chapters

Where the time goes.

00:00 – 03:20

01 · Cold Open -- 3 Questions Every Memory System Must Answer

Frames the whole video around Store, Inject, and Recall. Introduces MemSearch and Hermes as the two strongest open-source challengers. Sets expectation: this is not about more context -- it is about the right context.

03:20 – 10:05

02 · STORE -- How the Three Systems Capture Information

Side-by-side comparison. Claude Code: auto-memory, sparse, promotes to global after 3+ repeats. MemSearch: stop hook after every turn, Haiku bullets, Milvus vector DB (local CPU, zero cost). Hermes: agent-driven add/replace/remove, MEMORY.md (2200 char) + USER.md (3375 char) + SQLite raw transcript + 7-day curator.

10:05 – 13:08

03 · INJECT -- How Memory Reaches the Agent at Session Start

Claude Code loads CLAUDE.md + conditional memory file injection via pre-tool-use hook. MemSearch has NO injection layer. Hermes loads a frozen snapshot of SOUL.md + USER.md + MEMORY.md (~1300 tokens, prefix-cached) once per session.

13:08 – 19:04

04 · RECALL -- How the Agent Retrieves Past Information

Claude Code: checks auto-memory files, if not saved it is lost -- no search, no grep, no vectors. MemSearch: 3-tier retrieval (L1 hybrid vector+keyword search, L2 expand chunk context, L3 raw session transcript). Hermes: Tier-0 in-context MEMORY.md check, FTSS keyword query, Gemini Flash summarisation of top 3 sessions.

19:04 – 23:12

05 · Recommended Hybrid Setup -- Taking the Best of All Three

Store: auto-memory + MemSearch stop hook + agent writes MEMORY.md/USER.md + nightly memsearch index cron. Inject: Hermes frozen snapshot (~3000 tokens cached). Recall: Tier-0 in-context, L1 MemSearch hybrid, L2 expand, L3 raw transcript. Free plan.md available.

§ · Storyboard

Visual structure at a glance.

cold open
3-question diagram
STORE: claude code
STORE: MemSearch
STORE: Hermes
INJECT: MemSearch
INJECT: Hermes
RECALL comparison
RECALL: MemSearch
RECALL: Hermes
hybrid setup
inject + recall flow
CTA + close
§ · Frameworks

Named ideas worth stealing.

01:00 model

Store / Inject / Recall

  1. Store -- how does info get written to memory?
  2. Inject -- how does it reach the agent during a session?
  3. Recall -- how does the agent find old info when asked?

Three-question framework for evaluating any agentic memory system. Every decision about memory architecture maps to one of these three verbs.

Steal for Any MCN+ lesson on AI agent setup, or a CLAUDE.md audit checklist
14:07 model

MemSearch 3-Tier Retrieval

  1. L1: memsearch search -- hybrid dense vectors + BM25 keywords + RRF fusion
  2. L2: memsearch expand chunk_hash -- full markdown section around match
  3. L3: parse-transcript session.json -- raw dialogue as last resort

Progressive disclosure retrieval: only go deeper when needed. Semantic search means pricing finds monetization without exact keyword match.

Steal for Any long-term memory layer for JoeFlow or MCN agents
11:30 model

Hermes Frozen Snapshot Injection

  1. SOUL.md (~1.8 kB) -- agent identity / operating principles
  2. USER.md (1.4 kB cap) -- user profile, preferences, working style
  3. MEMORY.md (2.5 kB cap) -- curated project facts, decisions, context
  4. Daily log -- optional, today session context

~3000 tokens loaded once at session start, prefix-cached. Mid-session writes persist to disk but take effect NEXT session (frozen snapshot principle).

Steal for CLAUDE.md architecture upgrade -- split monolithic CLAUDE.md into SOUL / USER / MEMORY files
19:04 model

Hybrid Memory Architecture

  1. STORE: auto-memory + MemSearch stop hook + agent MEMORY.md/USER.md writes + nightly cron index
  2. INJECT: Hermes frozen snapshot (~3000 tokens cached per session)
  3. RECALL: Tier-0 in-context, L1 MemSearch hybrid, L2 expand, L3 raw

Combines completeness (MemSearch captures everything) with quality (Hermes curates what matters most) and speed (in-context check before any DB query).

Steal for Blueprint for the MCN+ agentic OS memory layer, or a paid workshop module
§ · Quotables

Lines you could clip.

10:08
"It is not about loading more context in. It is about loading the right context at the right time only."
Clean standalone principle, no setup needed, directly challenges the default instinct to stuff context windows → TikTok hook
13:22
"If you can store as much information as you want, but if you cannot get it out at the right time, then it is not worth having a good storage mechanism in the first place."
Punchy inversion -- storage without recall is worthless. → IG reel cold open
08:58
"MemSearch and Hermes go 10 x further than the basic claw code out the box."
Tight quantified claim, validates the upgrade journey → newsletter pull-quote
22:12
"Right now it is far, far behind what you can get from systems that are currently open source and free to access."
Closing contrast -- Anthropic vs open source. Strong CTA setup. → IG reel cold open
§ · Pacing

How they spent the runtime.

Hook length35s
Info densityhigh
Filler5%
§ · Resources Mentioned

Things they pointed at.

01:28toolMemSearch
01:36toolHermes
§ · CTA Breakdown

How they asked for the click.

22:27 link
"I will link below a completely free plan.md document for you to pass into Claude and set it up for yourself."

Soft lead-in referencing the paid agentic OS, then free plan.md as the accessible on-ramp. Clean two-tier CTA: free DIY vs done-for-you community.

§ 04 · The Script

Word for word.

HOOK opening / re-engagementCTA the pitch
00:00HOOKRight now, ClawCode's memory system is still way behind a lot of what the open source community has already figured out. So in a recent video, I broke down these seven levels of clawed memory systems. And whilst researching that, I ended up digging through some really advanced setups that people are building right now.
00:16HOOKSetups like the Hermes agent, MEMSearch, and a bunch of others. And to my own surprise, a lot of these systems looked incredibly advanced, but the core ideas underneath them are actually very simple to replicate. So underneath all the complexity,
00:29HOOKit always comes down to just two questions. When and how does information get written to memory, and when and how does it get retrieved again?
00:37So in this video, I'm gonna show you what ClawCode's memory looks like today, what the newest systems are actually doing differently, and then the setup I'd actually recommend if you want ClawCode to stop forgetting things.
00:49And one thing upfront, this isn't about loading more context into ClawCode. It's about keeping context lean, only retrieving the right information when it's actually needed.
00:59So let's get into it, and we can start off by talking about the three questions that every memory system has got to answer. So firstly, it's all about storage. How does information actually get saved and at what point?
01:11So what happens when somebody says something to Claude that's worth remembering? How does that actually get stored in the system? So you might say our landing page is school.com/scrapes,
01:22and you want Claude to always remember that information. So in some way, we want the agent to actually go away and save that, and we want that to be consistent and reliable. Or a decision like we're using Stripe, not PayPal, same thing.
01:35You want that to be saved into the memory and then retrieved at a later date. So we wanna understand how this information gets saved with all these different memory systems. Then we wanna understand how information gets injected.
01:45So you're probably familiar that the claude.md file gets injected into the system prompt whenever we prompt Claude, so it's injected every single time. So how do we actually take important context of recent memory
01:57and push it to the agent during our conversation so that next time you do start a session, you can open Claude code and the memory of the most recent or most important information is loaded in automatically. But it's only a snippet of that information. It's not tens of thousands of tokens.
02:13We have a small curated always there set of memory that's pushed in. So that, for example, Claude already knows your landing page URL or already knows your Stripe decision because we made that, and that's an important decision.
02:24So we've got storage and injection. But then more importantly for long term memories, how do we actually go and find and recover past information that we've told it?
02:33Information that we told it about client x six months ago. That's the information that we need to be able to recall. And this could be as recent as last week or it could be, you know, several years ago or months ago.
02:43So we might ask, what did we decide about pricing last Tuesday? And it might have a step by step process of let's check what's been loaded in the injection phase. If not, let's go deeper.
02:53And if not, let's go even deeper. And we need a framework to actually store and retrieve that information from the long term memory.
03:00So how does it store? How does it inject? And how does it recall?
03:04So these are the three themes that we're gonna follow through this video and talk about the different systems like ClawCode out the box, Hermes, and MemSearch, which are two of the best systems that I've found on the market. They often take completely different approaches.
03:17So let's get into the first section, which is all about storage. So when you have a conversation with Claude, it's actually auto detecting certain things you say in the background and writing them silently to dot m d files.
03:29These are stored at a per project level in the global space. So we've got the dot Claude project slash projects, and then we're storing memory folders back there. We then have a memory dot m d index, which is updated with all the files
03:42for which it can point to. So when you have a conversation in the future with Claude, it can always reference those files. Now this is on a per project basis, but if you repeat things multiple times and you have certain things, certain preferences that are done three or more times, then it gets promoted to a global dot Claude slash memory folder.
04:00And you can actually see this if you go directly into your Claude code terminal and do slash memory. It will say, do you want to look at your user memory, which is saved in the claw dot m d? Do wanna look at the project memory, which is also m d?
04:12Or do you want to open the auto memory folder? So if you open that auto memory folder, then you can actually go and see all of the files and the index of files that that's created, and you can see that those actually point to each other.
04:24So these are happening automatically in the background, and I wouldn't say they're very comprehensive. It's kind of mostly if you're telling it this is a really important thing, but, otherwise, it's not really gonna store a huge amount of information. Now let's look at what the open source community has figured out around this.
04:40So how do they store and capture information as you go through? So MemSearch uses a Claude code stop hook. So it's gonna fire after every turn,
04:48not just the memory worthy turns. So it's gonna call Haiku, which is gonna summarize each turn into bullets.
04:55And it uses Haiku because it's a cheap, fast model, and it's doing it all the time. It's gonna append that data to a memory slash date file with session anchors. So, you know, when you close a session and you have a specific
05:07session ID, it's gonna append that or the notes from that session to a specific memory file. So it's storing literally everything.
05:15It then periodically runs MEMS search index, or you can run this manually. Each bit of information gets chunked into a hash. Now the reason it's converting that information into a hash is because it can then embed those chunks and turn them into vectors.
05:30Those vectors are then stored in a MILVUS vector database, and it's all done locally on your CPU.
05:37So there's zero API cost. And what this actually means for you, it's not very relevant in terms of what it's being stored as. It's being stored as vectors, so literally a sequence of numbers.
05:47But what it does is store really effectively a meaning and a bunch of metadata associated with that specific memory. This is great for the retrieval stage later because it means we can actually retrieve information
05:59by meaning instead of just by keyword search. So not only do we have the markdown files, everything is also indexed and vectorized and put into a database in the back end automatically for us.
06:10That is absolutely critical for the retrieval stage later. And what's great about this is it basically treats markdown as the source of truth. So everything is appended as markdown, and then everything else is rebuildable later from those markdown files.
06:23So if you lost this database, you could actually rebuild it from all the memories that have been appended to that date. And the other good thing about it or good and bad, you could say, is it captures everything.
06:33So it's not just what auto memory from ClawCode thinks is the most relevant thing. It's actually gonna capture absolutely everything.
06:40Now you might wonder, is that overkill? Well, we can come to what Hermes does in a minute and decide for yourself whether that is overkill because Hermes actually takes a completely different approach. And it's closer to what ClawCode is doing out the box because actually the agent is deciding
06:56what to save. The agent has access to a couple of tools inside Hermes, so add, replace, or remove. And what it's doing is adding those to a memory dot m d file and a user dot m d file.
07:07So similar to what you've seen probably in OpenCLORE or if you've set up your own Agencik OS, you might have a memory dot m d and a user dot m d, But this isn't the same as Claude's memory dot m d. This is a memory dot m d with a cap on the number of characters that retains the most important information, and we'll talk about how it does that.
07:24So memory dot m d stores environment information, things you've done, and then user dot m d is all about user profile. So anything you say about the way you work or the way that you want to operate, user dot m d stores.
07:36It also has mechanisms in there for deduplicating. So whenever the agent thinks it's gonna add, replace, or remove something important, it will also check for duplicates and make sure that it's not writing
07:48duplicate information to our valuable memory space. Now all of these are kind of useless unless the information gets injected at some point, which we'll talk about next. But the important thing to know is these caps on characters
07:59enforce consolidation. So where MemSearch captures absolutely everything,
08:04the point in the Hermes memory logic is that it enforces that consolidation for when it injects that context later on. But in some ways, it is very similar to MemSearch because every turn, it also auto saves the complete raw transcript to a database in the background.
08:19And it uses a curator. So every seven days, it goes through and prunes and consolidates all of the information that we've just talked about.
08:27So the curator's job is to keep everything clean. What it does is remove the raw transcripts from that information. So whilst MemSearch stores exact
08:35raw transcripts, Hermes actually consolidates and prunes that information. So they're actually both excellent,
08:42especially when you compare it to claw code. And if you look in your own memory dot m d with the auto memory, it barely saves a thing. So MemSearch and Hermes go 10 x further than the basic claw code out the box.
08:53So which one would I actually recommend that you use in this approach? Well, MemSearch captures everything automatically with that stop hook, but it's raw and uncurated.
09:03Hermes is gonna capture our curated facts, especially those that are gonna be put into memory dot m d and user dot m d, which is lean and intentionally lean. But if the agent doesn't think to save something, it's kinda like with our Claude auto memory, it's still actually grabbing
09:19the full transcript and saving it into something that we can retrieve from a database at a later point. So my answer to which one should you actually use, I actually think we should combine the logic of both here. We should use automatic capture for completeness and then curated facts for what matters most because this is really important for the injection of the context phase.
09:37So take the best of both and combine it so we've got a long term search from this embedded vector database that we can search by meaning, but also the power of choosing specific information to store in the memory dot m d and user dot m d.
09:53So now that we come to the injection phase, we can actually push that information into our context without having to search through a load of raw uncurated transcripts in the background.
10:03So memory injection into the context window is quite misunderstood. It's not about loading more context in. Like we always talk about, it's loading the right context
10:15at the right time only. So the default behavior of Claude code is when you start a session, you inject the full Claude dot m d, and that's why we wanna keep the Claude dot m d ideally under 200 lines.
10:27That goes in with the system prompt. And then before you use a tool or before Claude uses a tool, there is actually a pre tool use hook which grabs the memory dot m d index, looks through those list of memory files that were stored earlier, and decides does it need based on your your query to actually go and research one of those memory files and inject that into the context too.
10:50If it does, it will inject that in as additional context inside the conversation. So this is a pretty decent starting point, but actually we can learn a lot from the way Hermes does this. We already saw that it captured a user dot m d and memory dot m d file with more information that's periodically updated
11:08and consolidated. We can actually inject those into the context window. But first let's quickly cover memsearch because it might surprise you here but memsearch actually has no injection layer at all.
11:20It just relies on the default behavior of Clog code injection the Clog. Md and the memory.
11:25Md. MemSearch is really built for the recall which we'll come to. So think of MemSearch as storage and search basically, a storage and search library that massively improves long term recall.
11:36Whereas Hermes I think nails this. So at the session start it basically loads a frozen snapshot similar to the way that Claude uses Claude. Md but it will not only use the Claude.
11:47Md, it will additionally add in the memory. Md, the user dot m d, and soul dot md every single time. And that comes to around 1,300
11:56tokens that are put into every single conversation window. Now this is per session because it's a frozen snapshot, so it gets cached in the memory. So you don't spend 1,300
12:06tokens every time you send a message. It's just at the start of a session conversation. The session ID will have that context save.
12:14So anything that's saved to memory dot m d, user dot m d, sold dot m d during the session will be written to the disc in the background and will not be loaded into that conversation, but will be loaded into the next conversation. So it's a really obvious choice for what logic we'd like to use for the actual injection layer and that's let's use ClawCode's behavior plus Hermes actual frozen snapshot to load in the memory dot m d, user dot m d, and sol dot m d, which as we saw in the storage stage consolidates recently
12:44biased and most important information inside these three folders or these three markdown files. Now, yes, you are loading in 1,300
12:53tokens every single session, but compared to the huge context windows, the increased performance you're gonna get from recent consolidated memories,
13:03in my opinion, is worth it. Now this is where stuff gets really interesting in recall because this is probably the biggest gap that ClaudeCode has out the box.
13:13Most of the time, we're not working just on a task by task basis with ClaudeCode. We have a bunch of clients. We have a bunch of projects on the go.
13:20And actually storing that information is critical. But recall is the most important thing.
13:25If you can store as much information as you want, but if you can't get it out at the right time, then it's not worth having a good storage mechanism in the first place. And ClawCode out the box has a really poor, dare I say it, recall system.
13:37So basically it's user asked about the past, some question about the past, It's gonna check the auto memory files which we've already seen. And if it's not been saved in there, it's completely lost.
13:47You might have opened the memory files that you had from earlier inside your project repository. It really is quite selective about what it saves. You probably don't have a huge amount of information stored there.
13:58So actually recalling past conversations and information is gonna have to just go and trawl through previous conversations you've had and actually burn through a load of tokens trying to find relevant information, and it has no methodology for doing so right now. Now you can, of course, use the resume flag to actually resume a previous conversation, but you have to know which session you actually wanna resume to get that context back.
14:19So for ClawCode, the storage of information is okay. The injection is basic with just the ClawDot MD, but the recall is actually really weak and where we can benefit most from external systems. So how does that compare to MemSearch if a user were to ask about something from the past week,
14:36the past month, the past six months? Well, MemSearch has a really powerful three tier retrieval system that basically only goes deeper if it needs to. It works on the same principles of progressive disclosure.
14:47So user asks a question about the past and we're gonna use the MEMS search search query. It's basically going to convert your query into vectors so that you can go and find in the vector database where we stored the information earlier semantic matches for your queries.
15:02Then because it's stored as vectors, we'll also be able to find matches for monetization, revenue, price. So it doesn't have to be exact keyword matches like we're actually searching in the vector database by meaning here.
15:14And it even has a method to do that by keywords. So the dense vectors allow it to search by meaning. The BM 25 keywords
15:22allow it to actually keyword match and then it's basically summarized in one list of these are the closest matches to your relevant query that you asked about the past. Now it will pass that back to the agent first and if there's nothing that's totally relevant, then it's able to actually go one level deeper.
15:38So at that point, it could stop and actually find really relevant queries, find exactly what we're looking for from information in the past. If that answers the question, great. However, if that does not answer the question, then it jumps to tier two which is search expand.
15:51And MEMS search expand gives it more context, more metadata, a summary of information around the match that we potentially found. And, again, if that is not good enough and we need the raw dialogue, then it's gonna go to the next tier level three, which actually has all of the session dialogue that we had. Because if you remember, every single message we send,
16:11it's summarized into bullets and then appended to the memory and then that is indexed. So all of the raw dialogue is actually saved and we can retrieve that with level three if we need to as a last resort. Now all of these take more tokens as we go down, but if you need a reliable system for retrieving information about your client's project six months ago, then MemSearch is gonna be the one.
16:33Now you might have identified the limitation in this approach which is if we're asking about the past it immediately thinks okay instead of searching the local context let's go and do a database query. So that's gonna be slower than just checking our local in context memory.
16:50So Hermes uses a really clever approach for this. First instead of going deeper into the database it's actually just gonna check our memory. Md.
16:58That has the question that the user has asked been actually accessible via just the memory. Md, which means it can actually
17:06get it from the context that it's already received. So the power in injecting this frozen snapshot
17:13means that actually for some queries, it's gonna be able to be answered just from the context that's already in the memory. And that will basically be zero cost and instantaneously accessible.
17:25So it should, in theory, always search the context of that existing conversation before it goes down to the levels and searches the database. So if it is not found in there, then it goes deeper and searches the sessions.
17:38And we already mentioned those were stored in a database the same as we did for mem search. But instead of being a vector database, it's just searching by keywords effectively. So then what it's gonna do is basically return the top three matching set sessions by relevance
17:53and summarize it using Gemini Flash and pass that back into the agent. So Hermes is really good at exact keyword matching. So if we were to ask it about pricing, it could find things about pricing, but it might not necessarily find things about revenue because that's by meaning and not keywords.
18:08However, they do do one really smart thing which we're gonna adapt and use, which is inject this memory dot m d into the conversation history. And then also by default, as a level zero, check that memory dot m d.
18:21So check what's already in context before jumping down into the MEM search hybrid search, the MEMS search expand, and the level three down here. So what we'd actually ideally do is grab this memory dot m d, check, and put that into the MEMS search flow so that we have a hybrid of both of those.
18:39So we can treat this step as almost like a level zero between MemSearch and Hermes so that we actually check what's already in context before we go deeper and check the vector database. So the user asks about the past, it's gonna check the memory dot m d and the context that's in that existing window. And if not found, then it's gonna go on to the MEM search to start searching the vector database by keyword and meaning and then continue to level two and level three if it needs to do so.
19:04So that's a lot of information. Now how do you actually set this up for yourself and take the best elements of each system that can be worked together?
19:14So here's what I'd actually recommend when taking the best from each system. So let's run through store, inject, and recall and the life cycle of a conversation as it happens.
19:24So but we will, of course, leverage everything that's already built into code that works well as best practice. So as a conversation happens, we're gonna leverage the auto memory, which is built in and saves those memory dot m d files to the Claude global folder for us. But after every term completes, we're gonna add in the memsearch stop hook that's basically gonna capture word for word all of our transcripts of our conversations so that those can be put into a daily memory.
19:49But what we want to do is maintain a memory dot m d and a user dot m d file so that actually if the agent decides that something is important, it's not just relying on Claude code to add, replace or remove into memory. Md or user. Md files.
20:03Now that covers actually storing more context so that we can actually retrieve it later. We, of course, also leverage the vector database
20:11of MemSearch which is actually consolidating this information into long term semantically searchable memory. So basically we're gonna run a nightly job to consolidate all the information that were put into that database.
20:22All the transcripts, all the raw transcripts are gonna be consolidated using this memsearch index every single night. And if all of this is sounding a little bit too complex for you to actually go and set up, then I'm gonna show you later where we've got an exact guide for free on how to give this plan to Claude code and it will go through all your file systems and work out how to actually implement this and do all the installations for you.
20:43Now injection, we actually leveraged Hermes logic. So when the session starts, we wanna inject a little bit more context than just Claude dot m d.
20:51We wanna inject the sole dot m d, the user dot m d, memory dot m d, and then possibly today's log if you could also inject yesterday's log if you think that would be relevant too. So that would be 3,000 tokens that are cached at the start of every session, which will really be important when we come to actually recalling it.
21:08So then we jump onto the recall segment of the flow. And what we've done here is combine the tier zero of Hermes where we check the memory dot m d and daily log first.
21:19So those are injected inside the system prompt every time we send a message, but they're cached. So what we're doing is basically before digging deeper into the vector database to search past history, we check the local recent data that's been loaded into the conversation already. So memory dot m d and daily log, that has zero cost, and it's also pretty much immediate because it already has it in context.
21:41If that is not found, then we jump on to the MEMS search traditional level one, level two, level three, where we search the queries using the hybrid keyword and semantic or vector search. We then expand those with the chunks. And then if we do not find the information still, then we can actually pass the raw transcripts
21:59and passed that information back to the agents. So this setup gives us the ability to actually search information really quickly from local
22:09recent files and prioritize those, but also gives us the ability to actually search further back in less recent history to recall all our old knowledge to the point where we can literally pull out the raw dialogue at the end. The one thing I want you to take away is none of this is complicated individually but it's all about preserving best practice
22:26CTAfor storage, injection, and recall so we can massively improve the memory usage inside your crawl code sessions.
22:33CTAIf you're working on projects and multiple clients, then this is an absolute must have. And I know Anthropic are working on their own memory systems, but right now it's far, far behind what you can get from systems that are currently open source and free to access.
22:47CTANow I'll link below a completely free plan.md document for you to pass this into Claude and set it up for yourself. Now if you do want this straight out the box, done for you, you know it's gonna work well, then we'll be implementing this inside our own Agentic operating system next week.
23:03CTAThat's also linked down inside the academy in the description below. If you want to see what other options I considered for memory, out the next video.
23:11CTAThanks for watch
— full transcript
§ 05 · For Joe

Steal the three-verb framework.

Memory architecture playbook

Every memory decision in any agent system maps to just three questions: Store, Inject, Recall -- and Claude Code out of the box fails at two of them.

  • Add the MemSearch stop hook today -- it captures everything your auto-memory misses with zero extra cost (Haiku is cheap).
  • Split your CLAUDE.md into SOUL.md / USER.md / MEMORY.md right now -- the frozen snapshot injection pattern gives you Hermes recall quality for free.
  • Build the Tier-0 check: before any vector DB query, check what is already in context -- instant and free.
  • The free plan.md at scrapeshq.notion.site/claude-memory-systems is a paste-in blueprint -- hand it to Claude Code and let it self-install.
  • Frame any AI memory content you create around Store / Inject / Recall -- it is the clearest mental model for this category and Joe could own it in the creator space.
§ 05 · For You

Why Claude keeps forgetting things -- and what to do about it.

For anyone using Claude Code on real projects

If Claude Code forgets what you told it last week, that is not a bug -- it is a design gap you can close yourself in an afternoon.

  • Your CLAUDE.md is only the start -- think of it as the always-there facts layer, not the full memory system.
  • The two free tools worth installing: MemSearch (captures everything) and the Hermes injection pattern (loads the right slice at session start).
  • The free plan.md at scrapeshq.notion.site/claude-memory-systems walks you through the whole setup -- paste it into Claude Code and it installs itself.
  • Pricing and monetization are the same concept to a vector search -- semantic recall means you do not have to remember the exact words you used.
  • You do not need to understand Milvus or embedding models -- the open-source tools handle all of that; you just run the stop hook and the nightly index.
§ 06 · Frame Gallery

Visual moments.