WEBVTT

00:00:00.080 --> 00:00:08.880
So in this video, I'm gonna walk through a deep dive on how the agent teams from Anthropix new Opus 4.6 release work in-depth.

00:00:09.040 --> 00:00:29.290
So as you can see on my screen here, I have my agent team running. And instead of just looking at JSON files or markdown files, I actually built a system on top of it to give me full surveillance as to what's happening in real time. So we can see here is I'm spinning up a web page, and you have a designer actively working on the task assignment,

00:00:29.450 --> 00:00:40.455
and all of the discourse between that agent to us and the agent to the team lead is fully transparent and something that you can audit along the way. And you can see that we go from designer

00:00:40.455 --> 00:00:41.495
to developer,

00:00:41.575 --> 00:01:15.165
and each one waits for its turn. Unlike before with sub agents, they worked in parallel, basically would give you the TLDR of the result at the very end, but there's no interplay in between the different agents. Here, we can actually click, see the entire discourse, the description, and if there are conversations between them, kind of like sending an email to each other or messages to each other on Telegram, we'll be able to audit that with this infrastructure as well. And what's cool is you can click on the history tab and go through any of the prior sessions we had along with the associated messages. You can audit what happened and why.

00:01:15.565 --> 00:01:22.640
So with that little teaser out of the way, I'm gonna walk you through how the agent team infrastructure works so you can leave this video understanding

00:01:22.800 --> 00:01:25.680
where to use it, when to use it, and more importantly,

00:01:25.760 --> 00:01:30.560
how to actually surveil your agents. And the reason why is this agent's feature is amazing,

00:01:30.720 --> 00:01:38.785
but it swallows up tokens like a vacuum in a very short amount of time. And last thing is you wanna make sure that you're not using this feature frivolously,

00:01:38.945 --> 00:01:52.620
not using it just because it looks cool, but because you actually have a complex task where you need a series of agents with ideally a sequence in order to execute on a recurring basis. And to enable the agents team specifically,

00:01:52.620 --> 00:02:10.715
what you have to do is enable the feature flag. Now if you are nontechnical, I have a cheat code for you. So instead of going to navigate this Claude settings dot JSON folder and worrying about what the heck you're doing, what you could do is literally go to the docs page on orchestrating these Claude code sessions

00:02:10.795 --> 00:02:25.970
and click on copy page. And then my cheat code is literally to give this to Claude code or to something like Warp, which is like a smart terminal to go and set everything that I need so I can just open Cloud Code and it's there. So as an example, I can go to Warp, then I could say,

00:02:26.370 --> 00:02:28.530
go read and double check

00:02:28.850 --> 00:02:31.650
that we have this installed

00:02:31.650 --> 00:02:32.130
correctly.

00:02:32.995 --> 00:02:34.275
I'll literally just

00:02:34.595 --> 00:02:43.475
throw that all in there. This will go through and just double check that our settings are where they need to be so you don't have to worry about putting this feature flag,

00:02:43.715 --> 00:02:51.280
all the agent team stuff, and you can even ask something like Cloud Code or Warp to help you with a proper prompt because the invocation

00:02:51.280 --> 00:03:03.040
of these teams is typically in this form where you say, I'm designing x, and I wanna be able to create an agent team or spawn an agent team. If you say something semantically similar to that, then that should suffice.

00:03:03.475 --> 00:03:12.035
If you wanted to make it even easier, then you could just give the URL right here and just say, read this document. It will come back with an understanding of the architecture,

00:03:12.275 --> 00:03:13.555
the feature flag,

00:03:13.715 --> 00:03:17.155
everything that you'll need, and you can just say, listen. Go and implement this all in my system.

00:03:17.720 --> 00:03:29.000
And once you're good and good to go from there, you can just restart Cloud Code. It should work in a brand new session. And then ideally, you should interview it to access and ask it. Do you have access to agent teams?

00:03:29.240 --> 00:03:31.960
Once it says yes, then it'll explain

00:03:32.040 --> 00:03:57.330
what tools it has. If it doesn't say that and it assumes that teams means sub agents, that means it's not properly installed. So as a quick preview, if I go to my prior session, if I just said, do you know how to spin up an agent team right here? Then it replies saying, yes. I know how to create a team, and it should lecture you on all the tools it has. So the name of the tools include these agent types, the task updates,

00:03:57.650 --> 00:03:58.530
the team delete.

00:03:59.105 --> 00:04:23.480
So once it's ready to clean up, the way teams work is once they spin them up, they persist until the actual goal is completed or the task is done, unlike sub agents where they spin up and die, spin up and die. And at the bottom here, I have a reference to the agent surveillance skill that I show you at the beginning of the video, and I'll even walk you through the TLDR of how you can approach building such a system yourself.

00:04:23.800 --> 00:04:32.745
And for step four, the goal is to assign the best model to the right agent at the right time. So in this case, the team lead always usually deserves Opus 4.6

00:04:32.745 --> 00:04:44.345
just because it will be managing the other sub agents as well. So the other team members will report up to the team lead to get approval, to get direction, so that's why you wanna be able to equip it with the highest complexity possible.

00:04:44.960 --> 00:04:49.600
For the additional team members, that's where it might make sense to use something like Sonnet 4.5

00:04:49.600 --> 00:05:09.185
or even Haiku depending on the task. And similar to invoking a spell, you have to say the magic words like we saw before to say something like create an agent team to review p r one four two. This could be an example of you actually reviewing an existing code base and looking for a team of reviewers. And these reviewers will be able to look at security,

00:05:09.425 --> 00:05:36.055
at performance, and validating test coverage in a way that it can finally discuss with each other. And similar to invoking a magic spell, you have to say those magic words like we saw before and say, create an agent team to review p r one four two. In this case, you could spawn up three different reviewers, one for security, one for code quality and performance, and one for actually validating the test coverage of the other two agents. So with all this out of the way, how do the teams work in-depth?

00:05:36.215 --> 00:05:43.390
So step one is you have the team lead looking around and understanding when it makes sense to spawn a brand new teammate.

00:05:43.710 --> 00:05:54.735
Even if the generation is in flight, meaning you've already assigned and hired three agents on top of your team lead, it could still decide that for this particular task or permutation,

00:05:54.735 --> 00:06:06.410
it might need to employ a new agent. Now the main thing to keep in mind is that as you spin up more and more agents, naturally, you will take more and more tokens. So if you're using something like Opus 4.6

00:06:06.410 --> 00:06:09.450
for all of them, you will see your usage evaporate

00:06:09.450 --> 00:06:35.440
really quickly. The theoretical mental model of spinning up different agents is having them work on completely different areas of the build where it makes sense for them to communicate. So if you're using sub agents, then what would usually happen is if you created a front end sub agent and then a back end sub agent, they would both work on their separate tasks, but there would be no communication between them. So you could run the back end after the front end, but ideally,

00:06:35.600 --> 00:06:49.865
there's a contract between the front end and the back end. So to create that contract in a way that there's cohesion from day one, ideally, they should be able to work on their own context window, comma, but be able to message each other and ask each other on direction.

00:06:50.025 --> 00:06:58.505
Because if the front end is gonna implement a brand new framework or use a library, you wanna make sure that it's compatible with your back end. And this is where this feature becomes ingenious,

00:06:58.790 --> 00:07:02.230
where you have that cross communication and cross pollination

00:07:02.230 --> 00:07:10.150
of these agents in a way that you have full transparency as to what's happening. And in a way, this shared task list replaces

00:07:10.150 --> 00:07:25.505
a lot of the trend that I've been seeing over the past few months. We have things like Vibe Kanban, an open source framework where people can put different tasks, and it auto arranges those tasks. I found it to be buggy, sometimes slow, sometimes doesn't activate at all, and technically,

00:07:25.665 --> 00:08:34.630
this is built in. So if we take a look at the agent dashboard that I showed you, I didn't actually engineer anything new here. It created all these categories right here inspired by the categories that are in the JSON files themselves. So I'm purely just visualizing what's happening and streaming the events in real time. The TLDR of the messaging system is you have the team lead who's always in the know, and then when the front end agent, agent a or agent b, are finished, they communicate with each other. So agent a will say, okay. I'm done the front end. And then the agent b will say, okay. I'm updating my endpoints. And then the front end, because they're closest to the fire, will update the team lead that it's all done. So the full life cycle becomes you say, build this app and spawn a team to help me accomplish it. The lead then spawns. Typically, it creates three to five subemployees depending on the task. These become the teammates that all have their defined roles, and then they coordinate together to create a plan. Once the plan is approved or you've kind of put it into YOLO mode or bypass permissions mode to execute it, then you essentially get to the final result. And if we take a peek here at running that web page that I showed you earlier,

00:08:35.685 --> 00:08:46.325
Once it finishes, you could see all the tokens that it's taken. So I think it's around 80,000 tokens for this web page right here. Personalized web page. I didn't give it anything.

00:08:46.485 --> 00:08:54.150
Looks pretty clean in terms of layout and format, and it does work from what I could see. And it's essentially one shot it, so pretty impressive.

00:08:54.390 --> 00:09:11.645
In terms of watching the agents do the work, you obviously have options. Now some people like to use this framework that's called TMux, and TMux allows you to essentially have a terminal with a main window, and then you could see the sub terminals open up where you can audit and see what's happening with the agents.

00:09:11.885 --> 00:09:31.010
The reason I made this UI is, one, to make it universal, and number two, as a dev, when I was walking through and looking at it, I just noticed that things are moving so quickly that I'd have to constantly scroll through the history to properly monitor what's happening. So completely up to you. If you wanna actually install it yourself, then you can use something like Warp.

00:09:31.555 --> 00:09:39.155
And the way I would actually just install it is by asking it to install and update to my latest version. I ask it, what is TMux?

00:09:39.155 --> 00:09:52.790
And it walks through exactly what that is. And then at the very bottom, once you wanna install it, you could say, can you install it? Can I use it in something like cursor? Then it walks me through that it's probably not the best use case there just because of the pain management.

00:09:53.030 --> 00:09:58.310
So you can just spin it up on your own terminal itself, and you'll notice if it works if we do something like this.

00:09:58.955 --> 00:10:00.635
So if we go to

00:10:00.955 --> 00:10:01.995
terminal

00:10:02.395 --> 00:10:03.275
right here

00:10:03.595 --> 00:10:09.595
and I spin this up and let's zoom in just a tad and then we say tmux,

00:10:11.030 --> 00:10:14.870
you'll see right here at the very bottom, it has this green little

00:10:15.190 --> 00:10:15.990
footer.

00:10:16.070 --> 00:10:19.350
Then if we say this is my shortcut for YOLO mode.

00:10:19.670 --> 00:10:23.990
You'll be able to spin up the agent teams, and it'll spin up multiple panes.

00:10:24.465 --> 00:10:48.870
But for me, it was a pain to look at that, a bit of an eyesore. So that's why we gave birth to this bad boy. And this is structured in a way where it's a skill. So it's invoked just in time. It's basically memorize the structure of the dashboard and memorize where it has to reference all the different files. Now what's the difference between agent teams and sub agents? Because that's a very important thing. So in the worlds of sub agents,

00:10:49.110 --> 00:10:57.005
they report back to you, but they never talk to each other. So even though they can run-in parallel, because of that lack of communication,

00:10:57.085 --> 00:10:59.005
you'll see a lot of disfluency

00:10:59.005 --> 00:10:59.645
or

00:10:59.885 --> 00:11:09.290
different objectives or different goals for the sub agents not knowing what their other agent compatriots are up to. One interesting thing is that with OPUS 4.6,

00:11:09.370 --> 00:11:17.850
it's infinitely better now at spinning up sub agents on the fly without you even asking for it to do things like explore your code base to actively,

00:11:18.170 --> 00:11:18.890
proactively

00:11:19.395 --> 00:11:30.115
preserve your context window as much as possible. So there's still a huge place for sub agents in the mix here. It's just a matter of when to use them. And for now, what I can see, when it comes to prepping,

00:11:30.115 --> 00:11:31.075
exploring,

00:11:31.315 --> 00:11:31.795
researching,

00:11:32.290 --> 00:11:41.730
doing tasks that are very admin in nature, it seems like sub agents is the way to go. Even though they also take a decent number of tokens, if it comes to code exploration,

00:11:41.890 --> 00:11:52.435
it will preserve your core context window, which really matters if you wanna be able to execute a team, execute sub agents, and still get everything done without having to compact your conversation.

00:11:52.755 --> 00:12:13.950
So if you needed one more diagram to drive this point home, with sub agents, you are the monkey in the middle right here getting all the discourse, all the updates, and you're the one who has to manage what has to happen next. In the world of agent teams, the team lead is the monkey, maybe not in the middle, monkey at the top, and then they receive all the inputs and questions

00:12:14.485 --> 00:12:21.685
and need for guidance from all the other teammates. Now when does it make sense to spend the tokens to use agent teams?

00:12:21.925 --> 00:12:34.010
Now these are four use cases that I've tried personally. I'm gonna keep experimenting and sharing what I find, but for now, parallel code review. If you've already vibe coded something and you're at the 85%

00:12:34.010 --> 00:12:41.050
mark and you're stuck, there's something not working. I personally have a project where I'm creating a version of OpusClip.

00:12:41.050 --> 00:12:52.895
It's a clone of it. I've been working on it for two months. I've been stuck on a core set of features that no matter how much I try or intervene using my own dev background, I can't get it over the line consistently.

00:12:52.975 --> 00:13:28.450
So I had it review the code base, and it found three or four areas where it had a series of duplicate functions or functions stepping on each other's toes that really muddied my code base. So having not just one extra set of eyes on it, but three or four helps you really orient yourself and better understand where to go next. The next, like I showed you, are cross layer features. So if you wanna build a web application or something on Next. Js and you have a front end, you have a back end, you have a database, and you have a series of features, this is where it makes a lot of sense to try to one shot the 80%

00:13:28.450 --> 00:13:33.650
with the team, maybe drive it home yourself. And the last two are debugging any hypotheses.

00:13:33.650 --> 00:13:53.830
So if you've watched my prior video before this even existed, I tried to create my own agent team by having multiple sub agents share a markdown file that they would use as a diary. So now that we live in this world, we don't technically need that strategy anymore unless you wanna preserve tokens at all cost. We can still use that method to do nontechnical

00:13:53.830 --> 00:13:57.110
tasks, create an entire brainstorming network.

00:13:57.190 --> 00:14:10.895
You could create a RFP generation network, proposal network. You can do all kinds of nontechnical things using this feature. And when it comes to research, a lot of people default to using things like Google deep research or Perplexity.

00:14:11.055 --> 00:14:14.335
Behind the scenes, you could implement a research committee.

00:14:14.495 --> 00:14:17.775
We have different agents go and research different parts

00:14:17.950 --> 00:14:35.315
but communicate their findings in real time. So you can imagine things like scientific discovery will become increasingly more possible. And when it comes to sub agents, it's really useful for quick research. We can just run it in parallel, a quick code exploration to see where things are or where certain files are,

00:14:35.635 --> 00:15:13.105
file operations where you wanna use lower token costs. So maybe you spin up four or five sub agents that use Haiku to do a very basic task. But you know at the end of the day that those are just being executed. They're being checked off a to do list, but there's no value add of adding more agents to the network in terms of additive knowledge. So here's a tactical applied scenario of, let's say, debugging a code base. So you could have agent one in a agent team decide that there's a memory leak of some sort. And then agent two could say, no. It's a race condition. And if you have no idea what those words mean, don't worry about it. Maybe just exploring some concepts.

00:15:13.760 --> 00:15:46.020
And the agent three could act as the devil's advocate, where it could say, you know what? You're wrong and you're wrong. I actually think it's something completely different. And then they work together. They have their own consensus until they get to the final result. So it's literally like having a possible committee go through and vote on what the problem is to get to the bottom of it. Now this debate could also happen with sub agents, but like we said before, they couldn't directly fight each other. When it comes to spinning up an agent team related to building a feature like authentication,

00:15:46.340 --> 00:15:54.420
you could end up with a UI agent, an API agent, and a database agent. And the way it would work is the first UI agent would design

00:15:54.580 --> 00:16:37.895
the scaffolding for the page, and they would realize that it needs an API, an API to actually allow the login. So this goes to the API agent that creates the API and sends some form of response to double check that it works, and then it realizes it needs users table with an email and a hash. So it needs some support to bring this API to life so that when you actually log in, this actually does something. So then the database agent gets to work. It creates this new users table right here, and it says it's ready. So then it goes back to the API agent, then it can go back and say, listen. We're good to go. It's time to actually test this out. If you want a quick heuristic on when to use sub agents versus agent teams, if you have a brand new task,

00:16:38.135 --> 00:16:52.540
the easiest question you could ask yourself is do you need agents to speak to each other? If the answer is no, then you can use sub agents. Otherwise, you wanna ask the next question. Is the task complex enough to justify the overhead of token slash cost of tokens?

00:16:53.065 --> 00:16:55.145
If the answer is yes,

00:16:55.385 --> 00:17:14.120
then use agent teams. But there is a world here where you say no to all of them, and you just have a normal session with no agents at all. The mentality I want you to adopt is if you are a bootstrapped founder of a startup and you have a very fine amount of money. Now depending on what plan you're on, that could be a very good analogy.

00:17:14.360 --> 00:17:21.965
But if you are bootstrapped, you have to be very picky about when it makes sense to actually hire your first employee or employees

00:17:21.965 --> 00:17:31.485
unless you raise some funding. But unless you have raised funding for your Anthropic subscription, I doubt that's the case. So you just wanna be as selective and picky and responsible as possible.

00:17:32.170 --> 00:17:41.130
The pros, just starting off with that, is you can require plan approval for any risky changes. So you could still tell Claude that this is what you want in terms of a framework.

00:17:41.370 --> 00:17:45.290
You can ideally aim to have five to six different agents.

00:17:45.795 --> 00:17:48.755
Beyond that, from what I found, is diminishing returns.

00:17:48.835 --> 00:18:06.460
Same thing with sub agents. And then you could start all of them working on a research task before they all work on an actual execution task. And in terms of the gotchas, like I said, the token cost is very real, and it will add up very quickly. You could spend anywhere between a 100 to 300,000 tokens

00:18:06.540 --> 00:18:09.100
just spinning up and executing an agent team.

00:18:09.580 --> 00:18:25.475
The next thing is even if you have separated roles, I've found that agents can overwrite the same file. Now it might make sense for them to overwrite it, but once in a while, you'll notice that one will do a great job and the other will come in and change just a few things that make it either unusable

00:18:25.475 --> 00:18:27.795
or it added bloat unnecessarily.

00:18:28.360 --> 00:18:33.640
The last thing is unless you create a way to respin up the exact same agents with the same instructions,

00:18:33.960 --> 00:18:56.845
every time it shuts down, you won't be able to bring that back up. And last but not least, the lead of the team could end up coding instead of delegating. So, ideally, you want your head team lead to be looking at the big picture and looking at what everyone's doing and making sure it accomplishes the goal. But once in a while, you could see it coding itself. So now you could see it stepping on the toes of its subemployees.

00:18:57.230 --> 00:19:27.360
Now hopping back into the terminal, one thing you can do to really visualize this entire system of where the messages lie, where the inboxes exist, is ask it for an ASCII art to visualize how this feature and function works. Now when I say this feature, I already referenced above the agents team feature. So it walks you. It says this you are the human, and you have a team lead. These are the functions that come with a team lead. So you have team create, task create, task tool, send message, then it walks you through how they share a file system.

00:19:27.920 --> 00:19:31.440
And then teammate coder will read the inbox, claim tasks.

00:19:31.600 --> 00:19:35.120
Teammate research will read the inbox as well, and so will the tester.

00:19:35.605 --> 00:19:48.885
And then in terms of file storage, it breaks down exactly where all this exists, which is why I always get confused while other creators confuse you with different folders and JSON files in a black terminal because it can be intimidating if you're nontechnical.

00:19:49.320 --> 00:19:57.480
You can just ask it to visualize it. And you can see right here, we have a config JSON that tells you who's on the team. Then you have all the agent mailboxes,

00:19:57.480 --> 00:20:02.040
which is one of the many ways we created that visual to monitor everything.

00:20:02.280 --> 00:20:18.410
All I said was, hey. Can you help me create a skill that when I say the words surveil my agents, we'll spin up the agent team and then look at the inboxes, look at the JSON files, and just stream whatever's there onto a local host UI. And ideally,

00:20:18.490 --> 00:20:23.610
make it so that you can spin up the same local host every single time so you don't have to go and search for a port.

00:20:24.090 --> 00:20:25.610
And then when you get to tasks,

00:20:25.770 --> 00:20:35.145
all of these are also structured. So the shared tasks board where something's pending, completed, that basically, that Kanban essence is already native to this structure.

00:20:35.465 --> 00:20:37.945
So with all of that in mind, you can see right here,

00:20:38.265 --> 00:20:40.905
this is my skill, my dashboard database

00:20:40.985 --> 00:20:42.505
using what's called SQLite,

00:20:42.505 --> 00:21:01.610
a free database that can live on your computer. No need for super base or anything else. In terms of where the information flows, this shows you exactly that the team lead will send a message directly to the inbox of the researcher. So it's at dot claud slash teams slash dynamic name of your team, inboxes,

00:21:01.610 --> 00:21:06.765
researcher dot JSON, then the agent waits to see when the next message arrives.

00:21:06.925 --> 00:21:34.105
Once it reads it, it tells us that it's read it. So if you think of WhatsApp or Telegram, when you see those two check marks and they're blue, then it knows you've read it. In this case, you have the exact same system. And then you have the task life cycle where you create a task, update said task, and then you delete the task when it's completed. And then the protocol messages look like this. So, hey. Can you review my PR? Like an actual human. Then you have the overall protocol, the protocol types,

00:21:34.425 --> 00:21:40.185
and this breaks down my surveillance dashboard. So it always will spin up on local host 3847.

00:21:40.425 --> 00:22:09.575
You can see live and history tabs, and you have a Kanban board and the inbox threats. If you watch it in action, it's literally as simple as me saying spin up an agent team to build a web page and surveil them using your special skill. Again, I told it in the skill itself that it should surveil the team whenever I say those words. So it knows to load that skill right here, then organizes itself, comes up with the three agents and the sub builders. It walks through how the pipeline will be designed.

00:22:09.815 --> 00:22:14.300
It executes the tasks. And while this is all happening, it's streaming

00:22:14.620 --> 00:23:16.898
on the main web page that I showed you. So once this is all done, then we're good to go. And the best part is it auto shuts down the session. And when it auto shuts down the session, that's where you can see everything go in the history tab. So you can see right here, this is the ended session of the web build right here, and we can always go back and look at the messages, tasks, etcetera. And that's pretty much it. So, hopefully, now you have a better understanding of when to use sub agents, how they work, and where it makes the most sense versus using something like sub agents or nothing at all. If you want access to all the diagrams I showed you along with a prompt that you can use to start your journey of creating your own skill for your own agent dashboard, I'll make them both available to you in the second link in the description below. But if you want access to my skill that I spent a couple hours putting together to make sure it works at least nine out of 10 times, then you're gonna wanna check that out in my exclusive early AI adopters community. And last but not least, if you found this video helpful and educational, would be super helpful for you to leave a comment and a like on the video. It really helps the video, really helps the channel, and I'll see you all next