Simon Scrapes · Youtube · 12:56

I Rebuilt Hermes in Claude Code (It's Ridiculously Good)

A 13-minute teardown of why rebuilding an agentic OS from scratch beats installing someone else's assumptions.

Posted

May 23rd 2026

today

Duration

12:56

Format

Tutorial

educational

Channel

SS

Simon Scrapes

§ 01 · The Hook

The bait, then the rug-pull.

Forty thousand GitHub stars in forty-six days. Before Simon Scrapes installed a single line of Hermes, he did something most people skip: he read through the issues. What he found convinced him to rebuild the parts he wanted instead — and the result turned out ridiculously good, not because it beats Hermes, but because he owns every layer of it.

§ · Stated Promise

What the video promised.

stated at 00:46 "I'm gonna show you the exact Hermes features I rebuilt inside Claude Code and the parts I deliberately skipped and why understanding the architecture underneath gives you way more leverage long term" delivered at 11:00

§ · Chapters

Where the time goes.

00:00 – 01:05

01 · Cold open + promise

Hermes velocity stat → 'I read through the issues' → thesis: rebuild don't install → what this video covers

01:05 – 03:08

02 · Cost #1 — Inherited assumptions

The self-learning loop grades its own homework. No external validation. Can silently overwrite your good work with no audit log.

03:08 – 03:47

03 · Cost #2 — Can't fix what you don't own

OpenClaw: 200+ CVEs filed since February, 386 malicious packages from one threat actor. You're debugging someone else's code.

03:47 – 05:08

04 · Cost #3 — Doesn't scale across clients

Paul Baier (nontechnical CEO) spent 100+ hours and $1,000+ testing OpenClaw. Hermes is single-tenant by design — separate install per client.

05:08 – 06:25

05 · What he rebuilt: Identity layer

Keeps user.md + memory.md from Hermes but adds per-client brand context folders — voice, ICP, positioning, visual identity — that share procedures across clients.

06:25 – 08:23

06 · Memory system

Keeps Hermes's capped injection (~1,300 char memory.md) but replaces keyword long-term search with MemSearch (semantic/meaning-based recall).

08:23 – 11:00

07 · Self-learning loop critique + skill systems

Hermes auto-generates new skills but ends up with 15 near-duplicate LinkedIn skills with no deduplication or version control. Solution: modular skill components that chain together.

11:00 – 12:56

08 · Build vs. buy trade-off + CTA

Honest framing: faster to start with Hermes, faster to scale with your own. Neither is right for everyone. CTA to AgenTek Academy.

§ · Storyboard

Visual structure at a glance.

hook — Hermes stars

hook hook — Hermes stars 00:00

cost 1 — assumptions

value cost 1 — assumptions 01:05

cost 2 — CVEs

value cost 2 — CVEs 03:08

cost 3 — scaling

value cost 3 — scaling 03:47

identity layer

value identity layer 05:08

memory system

value memory system 06:25

skill systems

value skill systems 08:23

CTA — AgenTek Academy

cta CTA — AgenTek Academy 12:09

§ · Frameworks

Named ideas worth stealing.

01:05 list

Three Hidden Costs of Off-the-Shelf Agentic OS

Inherit assumptions you didn't know existed (self-validation problem)
Can't fix what you don't understand (debugging someone else's code)
Doesn't scale across your business (single-tenant architecture)

Structured argument for why OpenClaw/Hermes have fundamental architectural issues that only surface once you're committed.

Steal for Any build-vs-buy pitch, any 'why I left SaaS' content, any tool critique video

10:00 model

Skill Systems (modular composition)

Voice lives in one file
ICP lives in one file
Formatting lives in one file
Skill system chains them together in the right order

Each skill is a modular component that feeds into a skill system. One update propagates everywhere. Contrasts with Hermes's auto-generated skills that accumulate as near-duplicates.

Steal for Claude Code skills architecture, JoeFlow orchestration, any reusable AI workflow design

06:25 model

Memory Hierarchy (Hermes-compatible)

Storage: auto-save + summarize every conversation
Injection: memory.md capped at ~1,300-2,500 chars per session
Short-term recall: injected context checked first
Long-term recall: MemSearch (semantic) not keyword search

Keep what Hermes gets right (capped injection) and replace what it gets wrong (keyword-only long-term recall).

Steal for Custom Claude memory architecture, any persistent context system

§ · Quotables

Lines you could clip.

00:22

"You inherit somebody else's architecture, their assumptions, and therefore their problems too. You can't fix what you don't understand underneath."

Clean 2-sentence thesis, no setup needed → TikTok hook

01:34

"The same model that writes the skill is also the sole judge of its correctness."

Self-validation problem framed in one sentence — memorable, shareable → IG reel cold open

03:08

"Hermes may be faster to start, but your own setup is actually gonna be faster to scale."

The core trade-off in one line → TikTok hook

10:39

"A skill is a modular component that feeds into a skill system. Each one does one job. It lives in one place."

Clean architecture principle, developer-friendly → newsletter pull-quote

11:39

"When your brand voice does shift, you just have one file to update and then every skill system that uses that is gonna pull from that single file. So it's infinitely maintainable and scalable."

Concrete payoff of the modular approach → IG reel cold open

§ · Pacing

How they spent the runtime.

Hook length65s

Info densityhigh

Filler8%

§ · CTA Breakdown

How they asked for the click.

12:09 product

"if you want my exact Agentic OS, it's inside the AgenTek Academy in the description below. And it's basically installed in one line, get it up and running today."

Soft sell, earns the right with a full teardown before pitching. No hard close. Immediately pivots to 'watch the next video' as a secondary CTA.

§ 04 · The Script

Word for word.

HOOK opening / re-engagementCTA the pitch metaphor analogy

00:00HOOKHermes went from zero to 40,000 GitHub stars in forty six days and to compare OpenClaw did it in sixty one. So for Agencik systems this is the fastest adoption ever seen on GitHub and when you look at what they do, the memory systems, the identity layers, and the self learning loops, you can understand why. But before I installed it, I did something most people don't do.

00:20HOOKI went and read through the issues and pretty quickly I realized something. The off the shelf systems are fast to begin with.

00:27HOOKThey're fast to start. But you inherit somebody else's architecture, their assumptions,

00:32HOOKand therefore their problems too. You can't fix what you don't understand underneath. So instead of replacing Claude code, I rebuilt the parts I actually wanted inside my own setup.

00:41HOOKAnd honestly, it turned out ridiculously good. Not because it's better than Hermes, but because I actually understand every single layer now.

00:49HOOKAnd I built it in a modular way so I can swap pieces in and out, reuse workflows across projects, and evolve the system as the space changes. So in this video, I'm gonna show you the exact Hermes features I rebuilt inside CoreCode and the parts I deliberately skipped and why understanding the architecture underneath gives you way more leverage long term than just installing something like Hermes blindly.

01:09So let's get into it. But before I show you what I built, let me show you the three hidden costs of installing something like Hermes off the shelf to save you some time and pain later. So cost number one is that you inherit assumptions that you didn't even know existed in the first place.

01:23So as an example, the infamous self learning loop on Hermes, the bit that everyone celebrates has no external guardrails. So effectively telling it to build its own skills automatically

01:34then grade your own homework. So we've got the self validation problem. The same model that writes the skill is also the sole judge of its correctness.

01:41So without that external validation step, it basically can't see its own blind spots. It thinks everything is good. And what that means in practice is it can quietly overwrite

01:50the changes that you've made to make your skills better with worse versions and has no version control or audit log. So you can say goodbye to your good hard work. So cost number two is that you can't fix what you don't understand.

02:03So OpenCLaw is one cycle ahead of Hermes. So the first version came out in November. The first version of Hermes came out in February, but it's the same category of product.

02:11But when you look at OpenCLR, we've got over 200 vulnerabilities identified and filed since February.

02:17You can see that we've got a ton of critical and high vulnerabilities that exist for OpenCLR. And a security researcher even found 386

02:24malicious packages on the skills marketplace from a single threat actor. So when something breaks at this scale, when something is critical to security, you're left debugging somebody else's code because you don't understand the assumptions underneath or their choices they made when they were building it.

02:39So cost number three then is it doesn't scale across your business. So we've got Paul here who's a nontechnical CEO. He spent over a hundred hours and over a thousand dollars testing OpenCLAW

02:50over two months. He wanted to understand if the hype was real, if it could do things that personal AI systems promised they could do, but basically later found that the bugs and security gaps that he identified disqualified it being from any sort of usable. He's now moved on to Claude and has replicated a bunch of the functionality, 30% of OpenCLOS features

03:08in the last couple of months. So Hermes may be faster to start, but your own setup is actually gonna be faster to scale. And the hidden costs of off the shelf software like OpenCraw or Hermes only show up once you're already committed and in the process of building with them.

03:22So let's get into what I actually built and what parts I lifted from Hermes. So the first thing that Hermes actually nails and the first thing I therefore rebuilt is the identity layer. So that agent needs to know who you are, who your business is, and what you stand for.

03:36Otherwise, every AI output is gonna sound like an AI output. So in Hermes, this represents itself as a memory dot m d file and a user dot m d file.

03:45It's a super simple setup and designed for one individual client or a single business. But that's also where its limitations come in because it's assuming that you're one person working on one set of stuff, and there's no concept of switching brand contacts, client contacts, or business contacts inside a single setup.

04:01So if you wanted to run Hermes for multiple clients, you'd effectively have to install for each individual client its own Hermes installation with its own memory and user dot m d files. So if you run an agency or multiple clients or even just two distinct brands of your own, you either bake it into one identity and one system

04:18in one install and live with that or you spin up entirely separate Hermes installs and each one of those has its own memory its own skills and its own learning loop. So I'm sure you can see how that embeds a maintenance problem because the skills aren't shared between the clients even though some of the procedures might be repeatable.

04:35And it's not a direct knock on Hermes, it's just what they built it for but it's not fit for purpose for a business owner running multiple clients or multiple brands. So the way that we've built this is to effectively inject context in the same way.

04:47So we have it for our own identity inside a user. Md file, have memories inside a memory. Md file but we also inject shared brand context

04:56like voice ICP

05:06So each individual client has its own set of shared context, their brand voice, their ICP, their positioning, and their visual identity. But they're still able to actually access and share the procedures or the skills across those client folders.

05:19So we've effectively built the folder structure so you can handle multiple clients or multiple brands but still share the relevant shared context so you don't have to maintain it in multiple places. It's just one single install versus Hermes for multiple clients would be individual installs that each have their own memory and learnings.

05:35Now what Hermes actually does is injects the memory dot m d and user dot m d into the start of every single conversation which drastically improves the short term recall of important information. So let's on go now to talk about memory, which is probably the most important feature after this shared brand context for getting better results.

05:52HOOKAnd I've got to give it to Hermes. They've actually really thought through the way you store, inject, and recall information at various points in the life cycle.

06:00HOOKNow before we move on to that if you're enjoying the content so far then drop down below, hit the subscribe button, hit the like on the video, it's massively helpful to me. So let's get back into the memory system that Hermes uses that's actually very very powerful. So when you consider memory, we've basically got three levels here.

06:14HOOKWe've got storage of context, then we've got how does that context actually get injected into every conversation, and then more long term, how do we recall memories that aren't recent but are still important? The ones that we have to go back and search for it.

06:27So simply put, Hermes auto saves and summarizes conversations every single conversation turn. It then injects important memories back into every conversation through the memory dot md, the user dot md, and sold.m d files.

06:42And that is capped at, I think, 1,300 tokens, which means we're only loading in a limited snapshot

06:49of recent important information for every session. But its biggest limitation is when you go back to actually recall the information that has not been injected into that recent memory and that's because it's searching by keyword and not meaning.

07:01So we might be able to recall exact long term memories if we remember the words we used when we were talking to Claude but it's much harder if we can't exactly remember what words we used when we talked to Claude about it, which is pretty likely. Right?

07:14And kind of rendering long term recall in this case a bit useless. Who remembers the exact words they used with a client six months ago in that conversation they were having with Claude. And this is where it gets really powerful when you're building a custom setup because we can take the stuff that we like about Hermes or the stuff in green like the fact we are capping a memory dot m d file at 2,500

07:35characters or 1,300 characters and injecting that as a recent memory into the conversation as a memory dot md file. Then where there were limitations like in the recall where we only had keyword search we can take other memory systems like memsearch in this example and make recall much more powerful and that's exactly what we've done with our own agentic operating system.

07:54So we're still using some patterns of the recall from Hermes where we effectively check that injected context first but then when the information is not found in that local memory we go deeper and actually search by meaning and not by keywords And that's part of the MemSearch architecture, not the Hermes architecture. So you can plug and play the bits that you like when you build your own custom system and make it bespoke for your context.

08:17Say you needed verbatim recall, you might implement Mem Palace instead of Mem Search for example. Now here's the bit where Hermes gets controversial which is that self learning loop we talked about earlier.

08:27So one of Hermes biggest selling points is the self learning loop. So an agent finishes a task it's gonna write itself effectively a new skill every time and use it the next time, which sounds brilliant in practice. And the first time it happens, it's probably pretty special.

08:42But what happens by the tenth skill or the twentieth skill when you've made tiny iterations on effectively the same process? So effectively what we're doing is we are starting on day one. We are telling it to do a specific task.

08:53And then a couple of weeks later when we come back to do a similar task, it's gonna create two skills that are fairly similar, have a similar description, but are kept as separate skills, maintained as separate skills because it's not gonna capture the nuance in our process. And we also have poor visibility of all the skills that we have existing already, so it's just gonna continue to create more skills.

09:13And each one is gonna capture that approach at the moment in time with that context for that specific situation. So over time you risk ending up with 15 skills that all do roughly the same thing like LinkedIn post v one, v two, LinkedIn post for this client this client instead. It posts writer one and two, all with slightly different context

09:33and slightly different bits of logic baked in. They've all got similar descriptions, it doesn't know which one to use at any which time. Then when your brand voice shifts or when a client's positioning changes,

09:43you've got like 15 places to go and update and maintain it. So yes it's absolutely faster to build this way initially but it's a hell of a commitment to actually maintain properly and basically therefore impossible to scale across multiple clients without the whole thing turning into a bit of a mess. Now we've created personally in house in our own AgenTek OS a whole logic around how to tackle this, and we call this skill systems.

10:05So a skill shouldn't be just a one off task. A skill is a modular component that feeds into a skill system. So each one does one job.

10:12It lives in one place. It has a consistent named format and gets updated in one place and all the updates propagate to the rest of the system. So when you want to do something complex like write a LinkedIn post in your brand voice for a specific audience in a specific format, you don't create a write a LinkedIn post skill that bakes in all of these things.

10:31You actually have the voice, the ICP, the formatting already maintained as separate skills and then the LinkedIn post system just grabs the correct context, the up to date context from one single file for the voice, for the ICP, and the formatting. And then this skill or skill system prompt is effectively chaining those together in the right order.

10:49So when your brand voice does shift, you just have one file to update and then every skill system that uses that is gonna pull from that single file. So it's infinitely maintainable and scalable.

11:00So Hermes is faster to build the first skill but building your own approach is gonna be faster to build the tenth, the hundredth skill system that depends on the actual skill and infinitely easier to maintain. So it begs the question, should you build this for yourself or grab something off the shelf? Well, if you install someone else's stack, you've basically inherited their assumptions about identity,

11:20memory, about how their learning loop should work, about whether you'll need multi client context. And some of those assumptions will work for you, and they might work for you. And Hermes is great as an off the shelf comparison to something like OpenCLR, which was a lot more buggy.

11:33But some of those assumptions might not work for you and then you're left actually trying to maintain or fix the broken parts versus actually just building it more slowly for yourself and understanding the assumptions and making it more scalable. So if you are building it for yourself, you're making those choices on purpose.

11:48Yes. You will move You'll get some of it wrong but every layer is something you can see, you can edit and actually reuse. You can build it in that modular way.

11:55CTAAnd when something does break, you'll have better knowledge of how to actually find the part that's broken and fix that so it's maintainable in the future. So that's effectively the trade off. It's gonna be faster to start with Hermes but faster to scale with your own built setup.

12:09CTAAnd neither is gonna be the right answer for everyone. Right? It's just a personal choice.

12:13CTANow I'm definitely not saying my version of the Agenetic operating system or every custom version is better than Hermes in every way. Absolutely not. But I understand exactly what assumptions have been made under the hood and I can build on it in a modular way, in a slower way that's gonna end up being completely custom to my own setup.

12:30CTASo if you want my exact Agentic OS, it's inside the AgenTek Academy in the description below. And it's basically installed in one line, get it up and running today. And we run through exactly what's inside the OS and all the logic so you're not just left installing something again without understanding the assumptions.

12:46CTAYou can plug and play the parts you like and leave out the stuff that doesn't work for you. Now if you want to see more around what we've got inside our agentic operating system, watch the next video. Thanks for watching.

— full transcript

§ 05 · For Joe

The modular OS beats the installed one.

Own your stack — the AI edition

Hermes is faster to start; your own setup is faster to scale — and the hidden costs of someone else's architecture only surface once you're already committed.

Use Simon's three-hidden-costs structure verbatim for any 'why I stopped using X SaaS' video — it works for any AI tool critique.
The self-validation problem ('grading your own homework') is a clean, quotable metaphor for any content about AI blind spots.
The modular skill system idea directly maps to Joe's own setup: voice.md, ICP.md, format.md as separate source-of-truth files that compose into skill systems.
Simon's multi-client identity layer (per-client brand context folders sharing procedures) is worth shipping inside JoeFlow's Sessions panel as a named feature.
The MemSearch upgrade (semantic vs. keyword recall) is a concrete next step for any memory system — worth researching for the JoeFlow stack.

§ 05 · For You

When to build your own vs. install someone else's.

If you're evaluating AI tools

Before installing any off-the-shelf AI system, read through the issues first — the architecture you inherit is harder to escape than the features you gain.

Any system that auto-generates its own rules without external validation will quietly degrade over time — look for that pattern before committing.
If you run multiple projects or clients, check whether the tool is single-tenant by design; the migration cost surfaces late.
Start with the simplest version you can understand end-to-end, then add complexity from systems you've reviewed — not from marketplaces you haven't.
Keyword-search long-term memory is a real limitation in most current AI memory systems; prefer tools that offer semantic recall for anything older than a few sessions.

§ 06 · Frame Gallery

Visual moments.

00:56

02:39

04:35

06:32

10:06