Nick Puru | AI Automation · Youtube · 17:04

This is the EASIEST way to setup Hermes Agent

A 17-minute walkthrough of Max Hermes: the cloud-hosted Hermes agent that costs 95% less than Opus 4.7 and writes its own skill playbooks after every task.

Posted
June 2nd 2026
today
Duration
17:04
Format
Tutorial
educational
Channel
NP
Nick Puru | AI Automation
§ 01 · The Hook

The bait, then the rug-pull.

Ninety percent cheaper to run, ten seconds to deploy, no Docker, no model wiring. The pitch for Max Hermes lands in the first twenty seconds, and it is built on a real cost arbitrage: the same open-source Hermes agent underneath, but running on a model that costs a fraction of what most people assume agent work requires.

§ · Chapters

Where the time goes.

00:00 – 00:32

01 · Hook and AI tool-switching tax

Opens with the 90% cost claim and the no-Docker promise, then names the core problem: every time you switch AI tools you start over from zero.

00:32 – 03:04

02 · What Max Hermes is

Defines the manual vs autonomous memory gap. Hermes writes its own playbooks; other tools make you write them.

03:04 – 04:45

03 · MiniMax workspace tour

Shows the MiniMax Agent UI: skills panel, office, tools, image and video generation. MaxHermes vs MaxClaw explained.

04:45 – 07:19

04 · M2.7 vs Opus 4.7 cost breakdown

$0.30/$1.20 vs $5/$25 per million tokens. 17x input, 21x output savings. 56% vs 64% SWE-bench. Verdict: not noticeable for everyday tasks.

07:19 – 08:16

05 · 10-second deploy

Activates sandbox instance, tours the fresh chat interface and skills panel.

08:16 – 11:21

06 · Connecting Gmail via Zapier MCP

mcp.zapier.com setup: new MCP server, connect apps, copy token URL, paste into MiniMax custom MCP config.

11:21 – 12:40

07 · Live task: cold-lead re-engagement

One prompt: pull 30 days of onboarding emails, find cold prospects, draft personalized re-engagement emails, queue as Gmail drafts.

12:40 – 13:47

08 · Saving the skill

Save this as a skill called gmail-cold-lead-reengagement. Agent writes its own playbook; keeps structure, strips voice and tone.

13:47 – 15:28

09 · Three memory layers

Layer 1 chat history (user-managed). Layer 2 agent reasoning (self-recorded). Layer 3 reusable skill (automatic, generalized).

15:28 – 16:15

10 · Scheduled tasks

Natural-language scheduling: every Monday at 9AM, scan inbound leads. No cron syntax.

16:15 – 17:04

11 · Recap and CTA

Free tier 4,000 credits/day. $19/month basic plan. Affiliate link for bonus credits. School community CTA.

§ · Storyboard

Visual structure at a glance.

Max Hermes landing page
Hermes GitHub repo
Talking head, problem setup
MiniMax workspace UI
M2.7 vs Opus 4.7 diagram
Zapier MCP credentials modal
Agent skill saved confirmation
Three memory layers diagram
Talking head, CTA
§ · Frameworks

Named ideas worth stealing.

13:47 model

Three Memory Layers (Hermes)

  1. Layer 1: Chat history (standard, user-managed)
  2. Layer 2: Agent reasoning over what worked (self-recorded)
  3. Layer 3: The skill, reusable playbook with non-generalizable parts stripped (automatic)

A three-tier memory architecture where users only manually manage the first layer. The other two accrue automatically as the agent works.

Steal for Explaining AI product differentiation in sales pages or comparison content
02:28 concept

Manual vs Autonomous Memory

The central product thesis: every AI tool has memory, but Hermes is the only one where the agent decides what to save and how to generalize it without user instruction.

Steal for Positioning that draws a before/after line between current-gen and next-gen AI tools
04:45 model

M2.7 vs Opus 4.7 cost/capability tradeoff

  1. Input: $0.30 vs $5.00 per million tokens (17x)
  2. Output: $1.20 vs $25.00 per million tokens (21x)
  3. SWE-bench: 56% vs 64% (8pt gap)

For everyday agent tasks, the benchmark gap is irrelevant and the cost gap is decisive.

Steal for Model selection justification for any agent-as-a-service pitch
§ · Quotables

Lines you could clip.

02:28
"It is not memory versus no memory. It is manual memory versus autonomous memory."
Tight thesis, no setup needed, quotable as a standalone claim → TikTok hook
05:57
"It is literally the difference between an agent that you can afford to leave running 24/7 and one that you cannot."
Consequence-first framing, lands the economic argument in one sentence → IG reel cold open
13:30
"The agent is not blindly saving everything I did. It is saving the parts that scale in. It is dropping the parts that should be fresh every time."
Explains the autonomous memory differentiator in plain language → newsletter pull-quote
05:47
"The only difference is your bill at the end of the month."
Short, punchy, closes the model comparison argument → TikTok hook
§ · Resources Mentioned

Things they pointed at.

§ · CTA Breakdown

How they asked for the click.

16:15 link
"Link to MiniMax agent will be in the description. It is completely free to sign up. You get 4,000 credits a day just for logging in. It is $19 a month for the basic plan. Sign up through my link and you will get bonus credits."

Soft affiliate CTA at the very end. Free tier with daily credits stated before mentioning paid plan. School community CTA layered on top.

§ 04 · The Script

Word for word.

HOOK opening / re-engagementCTA the pitch
00:00HOOKMax Hermes is the best AI agent that I am using right now. I'm gonna be completely honest with you. I'm running Hermes agent for about 90% less than what most people are paying for it.
00:09HOOKIn the deployment, it literally takes ten seconds to set up. There's no server. There's no docker.
00:14HOOKThere's no model wiring. And And the reason that this is such a big deal is because Hermes agent is the fastest growing AI agent of 2026. They have over a 150,000 GitHub stars in ten weeks.
00:24It's faster than LangChain, faster than AutoGPT. So half the people that I know and who try to set this up on x, they're never doing it right or they just give up. And what Max Hermes does is it just takes the exact same Hermes underneath.
00:36It's the same learning loop. It's the same memory system inside of it. It's the same skill library and it just wraps it into a cloud running on a model that costs 5%
00:45of what Cloud Opus costs per token. So it's the same agent. It's just way less friction, way less money.
00:50And in this video, I'm going to show you how to deploy it in ten seconds, how to connect it to your Gmail, how to run a workflow live, and how to save what it just did as a reusable skill so the agent gets every time that you use it. Now I do not care if you've never run an AI agent before a day in your life. By the end of this video, you will have your own twenty four seven AI employee running for about $19 a month.
01:11Alright. Now real quick before I open up anything, let me just set the table of why this is actually such a big deal. Because if you're anything like me, you've probably been bouncing between AI agents just trying to find the right one.
01:22So Claude Code, Codex, OpenClaw, Hermes agent, every couple weeks, there's a brand new one that's supposedly the best and people switch. Now I've used all of them, and most of you watching have probably tried at least two or three of them. But you notice the actual problem with that is that every time you switch, you're starting over from zero.
01:41So whatever you taught QuadCode about your business, your code base, your client process, your writing style, it's gone the second that you open Codex. And whatever Codex had learned about you, it's gone in OpenClaw. And the agent, it doesn't follow you.
01:54Now before anyone in the comments starts yelling at me, yes, these tools have memory. I know. ClawdCode has Clawd.
02:01Md and files and skills and hooks and ChatGPT has its memory feature. They all remember things, and I'm not saying that they are stateless.
02:08What I'm saying is it's memory you manage. So you write the MD, you decide what to remember, and you effectively are building the skills by hand.
02:17You rebuild the context every time you switch tools, and that's just manual memory. In Hermes, it's different. Hermes writes its own playbook just from doing the work.
02:25It watches the task succeed. It decides what's actually reusable, and then it saves it as a skill without you asking. And the agent, it compounds itself every time that you use it, and that's the actual gap.
02:35It's not memory versus no memory. It's manual memory versus autonomous memory. And honestly, once you watch this happen on screen for the first time, you probably can't unsee it.
02:45Alright. So let's open this thing up. So this is the MiniMax agent.
02:48Their team, they frame it as a twenty four seven all in one workspace. And honestly, the productivity layer, it is really mature from what I've experienced. So inside of this, we have the experts,
02:58we have different skills, we have the office, and we have tools and image and video generation, even some web tools here. Now, most platforms, they usually just ship like one of these features and mini max it has all of them in one place, which is why I have been very fond of it.
03:11Next up, you see we have a few different options here. So we have Max Hermes and we have MaxClaw as well. So this is just the other flagship inside of the MiniMax agent.
03:19So MaxClaw is just the cloud version of OpenClaw, which is, of course, as you know, one of the most popular open source AI agents on GitHub right now. They have over a 100,000 stars.
03:29And the whole idea behind OpenClaw is that your agent, it lives inside of your chat applications. If you live under rock and you don't already know what OpenClaw is.
03:37So different channels like Telegram, WhatsApp, Slack, Discord, instead of you just having to open ChatGPT in a separate tab every single time, you can just text it like a contact, it text back, and it has persistent memory across every conversation that you actually have with it. Now what most people do when they are first setting up OpenClaw or Hermes is they're going to be running this locally.
03:56Now the underlying issue with running this locally is you're going to have to set up the local gateways, the channel connectors, the model wiring, and even doing the memory storing. So it's just a whole setup project
04:09just to get it running. Now MaxClaw, MaxHermes,
04:12it fixes all of that. So it's quite literally a ten second setup. There's no Docker, there's no servers, and it's just one click to actually connect your messaging.
04:20And the agent, it lives inside of your inbox. So I'm not gonna go deep on MaxClaw today because Max Hermes, this is going to be the one that we are focusing on this video, and it genuinely is changing how I work. But if you've been wanting to run Open Claw and bounce stuff off of the self host as well, Max Claw is going to be your shortcut.
04:37But anyways, back to the flagship here. So again, Hermes, this is going to be the one that learns from how you work and this is what has been running for me for about two weeks at this point. The MaxHermi, it runs on a model called m 2.7.
04:48It's 30¢ per million input tokens. It's a dollar 20 per million output. Now if we compare that to Opus 4.7,
04:56which is $5 per input, $25 per output.
05:00So m 2.7, it is roughly 17 times cheaper on input and 21 times cheaper on the output per token. Now is m 2.7 the best model on the planet?
05:10I'll be completely honest. No. It's not.
05:11Opus 4.7, it still leads on the hardest coding benchmarks. So it's 64% on SWE
05:17bench pro versus m two point seven's 56. So Opus, it of course still wins the leaderboard. Opus 4.7, it's just the best model on the planet out there, but it's very expensive.
05:29You do not need it for 90% of the task that you are actually doing. So for what an agent actually does day to day, like reading inboxes, drafting emails, classifying threads, maybe even building some applications, or just automating some processes, you would not notice the difference between Opus 4.7 and m 2.7.
05:47So the only difference is your bill at the end of the month. And for an agent like Hermes that's constantly running tools and rereading its own skills, swapping Opus for m 2.7, it's not just any small savings.
05:59HOOKIt's literally the difference between an agent that you can afford to leave running twenty four seven and one that you cannot. Let's go through the full workspace, the hero agent, and I'll get a model built for this exact use case, but let me show you how to deploy it first. By the way, for those of you again who do live under a rock and you're not familiar already with what Hermes is, it's just an open source agent built by a research lab called News Research.
06:21HOOKSo the tagline that they actually have, it's just an agent that grows with you. And this is the project people points to when they argue about whether AI agents can autonomously learn, not just store the notes that you wrote them. So the pitch, it's pretty simple.
06:34HOOKYou give it a task, it executes the task, and then it writes a little playbook just describing how it solved it. And the next time you give it a similar task, it pulls that playbook off the shelf and it runs even faster. So the agent, it's going to be getting sharper and sharper every time that you are using it.
06:48HOOKNow this all sounds great, but the problem is just running it yourself. So you need Python setup, you need API keys for whichever models that you are plugging in. And if you want it running as a real service instead of just a terminal application,
07:00HOOKwell then, you then have to just spin up Docker or VPS and have the people I know who looked at running it, they just bounced before they got a single working task up and running. And they just don't have a practical and successful system and they usually just go back to whatever they were using before, which isn't as efficient.
07:16HOOKThis is the problem that Max Hermes solves for along with the pricing problem. Back inside the platform, I'll also have a link down below in the description to sign up. Also, you'll have the full guide inside of my free school community, so make sure to check that out if you're not already in there.
07:30So we'll just activate the Sandbox instance. Gonna start now and we'll click on confirm and pay. We just have to activate our instance.
07:36So we do have to have a base amount of tokens actually already in there. And just like that, in maybe two seconds, we already have Max Hermes set up. So again, with all of this, this is literally the same Hermes underneath.
07:46This is the same skill creation loop. It's the same memory system, the same idea. Now that we have our fresh chat ready to go, we have our skills section on the left hand side.
07:54This is just where any playbooks that the agent builds will live. So the current processes will be on the right hand side. If we just open this up, of course, we don't have anything up and running just yet because it's brand new.
08:07But this is where you'll watch the agents just work in real time once we do give it something to do. But let me just go ahead and make this actually useful. I have to give it access to my tools.
08:16So to actually do that, we're going to be using something called MCP, auto contacts protocol. This is just the open standard for agent to tool connections.
08:24So you can just point the agent at an MCP server, and the server handles all the actual tool calls. So for this video, to make it as simple as possible, I'm using Zapier's MCP server because it covers the broadest list of applications. So there's about over 9,000 applications and different integrations.
08:38So Gmail, Slack, Notion, Linear, HubSpot, I mean, Calendly, Stripe, mean, you name it. If you are using it, they most likely have it.
08:46So if you've got a Zapier login, you already have the connection layer for Max Hermes. So it's just a few different steps you have to actually go through. So to get started on that, I'm just gonna load up mcp.zapier.com
08:58and we're just going to log in to our account. Now what we can do is we just click on new MCP server.
09:03We could scroll down and just click on other. And there's gonna be a few tabs that we wanna focus on. So there's the applications and there is the connect tab as well.
09:11So apps, this is of course where you just wants to add all of your different applications. So you can literally search through all of the different apps they have. So I mean, you just think of a weird one, maybe it's, um,
09:21Jira. They have Jira on there or, um, HubSpot, of course, they have all the popular ones.
09:27And we can get a little bit more specific, so maybe go high level. I think they call it something different like lead connector. Yeah, just like that.
09:33Here's go high level. And then of course, we have all the other important ones. So maybe it's QuickBooks,
09:38check for that. We have QuickBooks and maybe another one we could throw in there.
09:43Let's think of Fireflies. There we have Fireflies. So what I'm going to do is I'm going to keep it pretty bare bones, pretty fundamental and we're just going to connect the most important tools.
09:51Now I wanna give it access to all these different permissions, all these different scopes. So I'm going to click select all tools and then connect and then just like that, we just have to log into our Gmail account. There's really no configuration we have to do there.
10:01Now from here, let's say we have all of our tools connected, we can just go into the connect section and this is where we have to a brand new token. So we can just copy the full URL with the token embedded.
10:13So let's grab this right now. Grab both of these. Actually, we'll go to option two and grab this full URL here.
10:21So this URL, it's just going to be the bridge between Max Hermes and Gmail. So we're going to feed it directly to the agent in just a second. But what's actually worth understanding is that we're not just connecting Gmail.
10:31We're connecting a server that can actually route to anything in this 9,000 application library. So if you want Slack later, you just add it on this Zapier side.
10:41The URL, it's always going to stay the same. You just have to make the change on the actual application section.
10:48I've saved my credentials. You would just have to make the changes here. You don't have to get new URLs or anything like that.
10:53It's extremely simple to manage. Now back inside of MiniMax, we're just gonna go to the new task section. We first want to configure our MCP.
11:01So how we do this is we just click on this little settings section. We could go to manage MCP, and we either just connect what applications they have here.
11:09They only have like five. Well, a little bit more than five, but we click on custom beta. We'll just call this Zapier and we paste in our URL that we just copied from Zapier.
11:19We'll click on confirm and then we'll have to do one more thing. Now back inside of our Hermes agent, I can now type out connect to the Zapier MCP server to access Gmail tools. Once you are connected, pull every client onboarding email that I've sent in the last thirty days, find the ones where the prospect went cold, and after this second touch, draft personalized reengagement emails referencing what we last discussed and queue them as drafts in my Gmail.
11:41Now I have found at times that connecting to this MCP server, it does take a few iterations and back and forth with the agent itself, but, you know, after about five different chats, then you'll should be able to connect to it. It is just a little bit finicky. But anyways, here's what we got back.
11:55We have a real draft. All of this, of course, it's personalized to the actual conversation that I had with that prospect. It's now sitting inside of my drafts folder inside of Gmail and it's ready for me to review and then send off.
12:08So it's just one prompt. We have the MCP, we have the agent and the Gmail tools all working together now. Now this next part, this is what makes Max Hermes different from every other AI tool that I have used.
12:18So the agent, it of course, as you just saw, just did this task. But by default, the next time that I ask it to do something similar, it would start over from zero, unless you have actively built memory for it by hand, like writing a Clawdet MD or creating a custom skill. So watch what happens when I just tell it to remember this workflow.
12:36I'm gonna type out save this as a skill called Gmail called lead reengagement. Call it whatever you want. It doesn't really matter.
12:43But you can see here's the response that we're going to get back, and that is literally it. So the agent just wrote itself a playbook. So if we just navigate over here into the skill section, this is now sitting inside its library.
12:55So it's available the next time that I run a similar task. But the part that's actually wild is if you watch what agent actually says that it kept and what it deliberately left out. This is the part that no other chatbot is actually doing on its own.
13:09So the agent, if you actually go through this, it's reasoning about what is actually reusable in this workflow. So it's the structure, it's the search logic, it is the drafting steps, and the way that it queues things in Gmail as well, all of that is going into this skill. But the voice and the specific phrasing and the tone of the email itself, that is not going to go in because that would just make the next batch sound,
13:32I would say, stale. So the agent is not blindly saving everything I did. It's saving the parts that scale in.
13:38It's dropping the parts that should be fresh every time, and that is autonomous memory. And it's the reason that a $19 per month tool starts to feel like the best money you have ever spent on AI. So here's how to actually be thinking about this high level.
13:50So most AI tools, they have memory that you manage like claw.md, JHBT's memory list, cursors history, and you just decide what gets remembered. Right?
13:59Max Hermes, it has three different layers, and that's just Hermes in general, and you only manage one of them. So layer one, this is what you said. This is the chat history.
14:08So this is very standard. And layer two, it is the agent's reasoning over what actually worked. So which steps mattered, which API calls succeeded, what edge cases actually came up, and the agent, it records this on its own.
14:21Layer three, this is the skill itself. So this is the reusable playbook that the agent decided was worth saving with the parts that do not generalize stripped out. So this is also automatic.
14:32So after maybe a month of using this, you have gotten a library of skills written from how you actually work, and it's not just how MiniMax thinks you works. It's not how some prompt template thinks how you work. It's how you actually and genuinely work, and that library, it is going to be yours.
14:46Now a couple more things that I do want to touch on. So the skills we have briefly covered earlier in this video, but I mean, there's just hundreds and thousands of different skills that you can actually utilize. So you can go to a humanizer, writers.
14:59I mean, these are all just different things from GitHub. We have the flow diagram expert, which is going to help you create flowcharts. We have notebook l m.
15:06This is going to be able to query notebook l m notebooks directly from Claude code. I mean, there's so many different options. We have a Claude code harness.
15:15We have an SEO audit where it's going to analyze your crawlability indexation, speed on page optimization.
15:22I mean, so much different options. You can be providing these all into your Hermes agent, so it can just do more and more and more for you. Now the last piece that I wanted to cover before we actually wrap up is you do not have to be at a key for any of these tasks that you want to run automatically.
15:36So what I mean by that is hypothetically, every day or every Monday at 9AM, you want to scan new inbound leads from the last seven days and then qualify each one against my ICP filter. And then from there, like, can just queue the qualified ones as Gmail drafts with a personalized first line referencing their company.
15:53So we could type out in plain English every Monday at 9AM, scan new inbound leads, and basically what I just covered earlier. So I'm not gonna run through all of that. And then we just have to ensure that it's going to start running.
16:05CTASo and then from there, it'll tell us that this is now going to run at Monday at 9AM every single week for us. But that is Max Hermes ten second cloud deploy.
16:15CTAIt is a learning loop that compounds every workflow that you teach it. You can schedule jobs that run on their own. They have a skill library that gets sharper every week that you're using it, and you can run on a model that's going to be 95% cheaper than Opus 4.7.
16:28CTAAnyways, that's everything that I wanted to cover. So link to MiniMax agent will be in the description. It's completely free to sign up.
16:33CTAYou get 4,000 credits a day just for logging in. You don't have to even input your credit card, and it's $19 a month for the basic plan if you want more headroom. But if you want to sign up through my link, you will get bonus credits, so use that one.
16:46CTAAlso And make sure to check out our free school community if you want other frameworks and more videos on stuff exactly like this, how to be using AI more efficiently for your specific scenario, and making sure that you can actually get AI up and running as easy as possible. So make sure to check that out. Link will be down below in the description as well.
17:01CTAThank you guys for watching. I'll see you in the next video.
— full transcript
§ 05 · For Joe

The agent that writes its own instructions

WHAT TO LEARN

Autonomous memory, where the agent saves its own playbooks rather than waiting for you to write them, changes the economics of running an AI worker, not just the convenience.

  • Every AI tool claims memory, but most require you to write the rules: the markdown file, the prompt, the saved context. Hermes inverts this, the agent writes its own playbook after each completed task.
  • The agent does not save everything indiscriminately. It reasons about what is reusable such as structure, API call logic, and search steps, and strips what should be generated fresh each time such as voice, tone, and specific phrasing.
  • Three memory layers operate at once: chat history you can read, reasoning the agent records about what succeeded, and a distilled skill it decides is worth keeping. You only manage the first layer.
  • Model cost is the hidden reason most people do not leave agents running continuously. At 17x cheaper input tokens, the economics shift from use when needed to leave on all the time.
  • A natural-language scheduler means recurring workflows such as weekly lead scans and daily inbox triage can be set up in a single prompt without cron syntax or a separate automation platform.
  • After a month of real use, the skill library reflects your actual workflows, not generic templates, and that specificity is what makes the compounding meaningful.
§ 06 · Frame Gallery

Visual moments.