WEBVTT

00:00:00.000 --> 00:00:35.290
My agentic engineering workflow has changed. It's better. The models have got better. Some of the tools have switched up. The main important thing that you need to understand, though, is that the experience I have from building applications, it's what steers me when I do agentic development, when I use agents to build my applications. Now usually the videos I've done is I'll show you the tools that I use and sort of high level experience my workflow. There's an app I've been working on called Pluto, and I'm gonna build out one of the features that I had planned with you. This video might be long. This video might be short. This video might get in the nitty gritty. I don't really have a plan for this other than record

00:00:35.290 --> 00:00:48.545
me building a feature. And if that excites you, I hope you're ready. Sit back, relax. Let's get straight to it. So high level of my workflow. I'm using g v d five five extra high fast, and I'm using it in cursor.

00:00:48.625 --> 00:01:19.405
Now I know a lot of people love using Codex app and the Codex CLI. You can do that as well. I genuinely just prefer Cursor. Now Cursor's on the more expensive side, but in my opinion, it is worth the cost especially for the app that I'm building. Second thing that I'm introducing is I'm using Greptile for the code review. Now there's other great code review tools as well, But the reason why I stick with Greptile, and I really like Greptile, is the slash grep loop skill that they have. I'll explain what that is in a second. And third, I use WhisperFlow. I've noticed that, man, when you speak,

00:01:19.645 --> 00:01:30.070
when you use speech to text, you will say a lot more than you type. Right? It's gonna take me a second to type things out. But if I'm just a yap, I'm already a yap or I'm a YouTuber. I yap for a living. WhisperFlow

00:01:30.070 --> 00:01:50.510
just makes so much sense, and I'll be honest with you. I haven't been on the paid plan. I've used it for the last couple months. I haven't paid for a single thing. I don't even know what paid users get. That's how much I've been using that, and that's how generous the free tier is. So we're gonna use cursor, g p t 5.5 extra high fast. We're gonna use greptile. I'm gonna talk about the grep loop, and we're gonna use whisper flow for all our prompting.

00:01:50.670 --> 00:01:58.590
Let's build out this feature. So what I wanna build for Pluto is an artifacts feature like Claude. Right? And if you're not familiar with artifacts,

00:01:58.590 --> 00:02:18.395
I have an example here. I prompted the agent here, show me financial projection of some who invest $500 a month from age 18 and how much money they'll have by 40. Now Claude built, an inline component, which is actually pretty cool, and maybe this is another feature we look at. But right now, what I want is basically what's on the right and, like, an HTML page or React page,

00:02:18.820 --> 00:02:20.500
whatever it is being generated,

00:02:20.660 --> 00:02:46.215
and I can visually see this. This is what artifacts was. This is what I think Anthropic, like, innovated on and it was pretty cool, and I'd like that for Pluto. Now Pluto is pretty awesome. There's a lot of cool things that come with Pluto out of the box. Right? I obviously have a chat interface. I can connect to my iMessage, Telegram, Slack, but there's a couple cool form factors. Right? I have a Kanban board called tasks. I can also set up routines, which are repeated tasks. Every agent gets its own email.

00:02:46.375 --> 00:02:56.560
Right? Thousand plus connections, right, using Composio. And then we have a files a dedicated files workbench, and this is basically where you can upload, like, you know, invoices,

00:02:56.560 --> 00:02:57.360
contracts,

00:02:57.360 --> 00:02:58.160
spreadsheets,

00:02:58.160 --> 00:03:03.365
whatever. And, like, there's a OCR workflow and, like, there's a very specialized

00:03:03.365 --> 00:04:06.305
files workflow where the agent has precise knowledge and data, especially for very, very large files. This is pretty awesome. And then cards, you we're actually working on I'm actually working on being able to give the agent its own credit card, virtual card so it can make payments, has its own phone line, and then finance is something cool where you can connect your business's finances and it can read all the information. Right? So this is basically an agent for businesses. And when we go to chat, you can also give every agent has its own dedicated computer. Right? Right now, we have Linux machines. Soon, we might be able to give access to Mac machines or Windows machines, but right now, we have Linux. So this is all pretty cool. This is Pluto in a nutshell. If you want a more dedicated video on Pluto and how it works, let me know in the comments down below. Let's build out this feature. Now the first thing I'm gonna do is I'm obviously gonna open up cursor. I'm gonna open up cursor. Let's give let me zoom in a little bit, and I'm gonna start yapping. I want to build a Claude artifacts like feature. If you're not familiar with Claude artifacts, basically, I can prompt the agent to do something. And if there's, a visual component, whether it's writing HTML

00:04:06.305 --> 00:04:07.905
or a markdown file

00:04:08.065 --> 00:04:39.615
or whatever the case, maybe a react file, it will preview it to the side. And because you're a smart agent, you have access to a web fetch tool, why don't you search the web and learn what the CloudRefacts feature is and tell me about it because this is what we're going to build. And WhisperFlow processes that. We hit enter. Now the way I'm sort of working on this app or at least the workflow that I have in terms of CI, CD, and all that type of stuff, I'm using GitHub, of course. But the way I'm developing everything is I have a staging branch. Everything

00:04:39.775 --> 00:05:06.565
gets, like, you know, I'm I'm working on a feature locally. Once I like it, move on to staging branch. I test it out on staging branch for some time, and if I like it, move it over to the main branch. Now I talked about greplu for a second. I kinda wanna explain to you how that works. So one of the code reviewers I have here is greptile. This is a pretty large PR, so I won't be able to review the entire thing, but there was a moment in time it did. Let me show you exactly where that could be.

00:05:07.200 --> 00:05:09.040
I think we just need to unload,

00:05:09.440 --> 00:05:29.515
and here you have it. Now what's cool with the grep tile, get the summary and you get this confidence score. Right? You get this confidence score. Right now, there's a four out of five. Anything four out of five and higher, obviously, being a five out of five is good enough for me. But what's cool about greploop, and if you don't know how to set up, if you haven't had it set up already, you literally just go to greptile's

00:05:29.515 --> 00:05:31.355
repo, find their skills,

00:05:31.755 --> 00:06:02.155
and the greploop skill is what you want. And, essentially, how greploop works, I can diagram this for you. Let's say there is me right here. I actually have a great icon for this. Let's say there's me. I push a change to my app. What greptile is going to do is greptile is going to review it. Right? It's gonna do a review, and then let's say I get a two out of five. Right? Let's say it works, but there's some security features that I missed. There's some edge cases that I missed. Like, I just missed a bunch a bunch of things. Now I can read the comments,

00:06:02.610 --> 00:06:03.970
give it to my agent,

00:06:04.130 --> 00:06:08.050
and get the agent to address the comments, or I can just enter

00:06:08.210 --> 00:06:11.170
grep loop, assuming you have the skill installed.

00:06:11.250 --> 00:06:17.170
Once I have grep loop loaded, what's going to happen is my agent is going to read

00:06:17.415 --> 00:06:18.375
from GitHub.

00:06:18.695 --> 00:06:20.935
It's going to read the comments.

00:06:20.935 --> 00:06:56.825
It's going to take in the comments. It's going to address the comments, and it's gonna push a change. And then what's going to happen is it's going to wait for a new review to be generated. Right? Because every time you push to that same branch, greptile files a review. Now let's say a change was made and it gives it a three out of five. Meaning, yeah, you addressed some things, but there's still some more missing. What greploop is going to do is it's going to wait till it gets the new review. When it sees the three out of five, it's gonna realize, hey. This isn't a five out of five. Let's go back. Addresses the changes. Pushes again. It will keep going. I think there's a maximum of, like, five to six turns,

00:06:56.985 --> 00:07:04.830
but, essentially, it will keep going until it gets a five out of five. Right? And the reason why I like GPT 5.5

00:07:04.910 --> 00:07:06.030
extra high

00:07:06.190 --> 00:07:08.270
is extra high fast

00:07:08.350 --> 00:07:33.470
is it's a really intelligent model and especially when it comes to building complex features, it just writes a bunch of tests. And in this case, this is actually great because whenever I get feedback, it reviews the test and realizes, okay, the test case, the originally wrote passed, I need to add some more additional things and this has just been a great experience thus far. So let's see what we got. It says, found the feature. I think you mean claw

00:07:33.630 --> 00:07:47.105
clawed artifacts. Oh, they're right. Cloud. My bad. Clawed artifacts are standalone piece of generated content that appear in a dedicated panel beside the chat. Cloud creates one when the output is substantial, self contained, and is likely to be edited, reused, previewed,

00:07:47.105 --> 00:07:55.345
or referenced later. Anthropics examples include markdown, plain text doc, code snippets, single page sites, SVGs, diagrams, flowcharts,

00:07:55.660 --> 00:08:02.060
and interactive React components. So so far so good. Core behavior from the docs, the artifact opens in the right side preview.

00:08:02.220 --> 00:08:18.085
Users can ask Claude to modify the artifact, etcetera, etcetera. So it it's got the gist. For what we're building, the important product shape is an agent produces a normal chat most of the time. But when it creates a visual or reusable deliverable, we promote that output into a first class artifact

00:08:18.165 --> 00:08:30.840
with its own life cycle, type, title, source, preview, renderer, versions, error state, and update path. The main architecture pieces will likely need artifact detection and creation, makes sense, artifact registry,

00:08:30.920 --> 00:08:32.120
preview runtime,

00:08:32.120 --> 00:08:37.320
code preview toggle, iteration loop, sandboxing and security, error capture,

00:08:37.400 --> 00:09:09.290
and sharing export later. Okay. So so far, we're good. Now this is what I'm gonna do. You would know exactly the type of feature that I want. I now want you to create a plan on how we're going to build this. Make sure you view the entire code base. Make sure you understand how things work. I don't want us to build this feature for the cost of breaking another one, so make sure you do a great job and yeah. Give me your plan. So it's going to generate a plan. Let me go to plan mode. It's gonna generate a plan. Now there are other skills that I have, one in particular that I really use a lot, and it's called slash code

00:09:09.595 --> 00:09:10.795
dash structure.

00:09:11.035 --> 00:09:18.555
And, basically, take you guys to the repo, and, again, I'll link this down in the description down below. This is my personal skill. This basically restructures

00:09:18.555 --> 00:10:06.260
a specific feature, the code base in a service layer. Therefore, it's very clean. It's very understandable if I need to dive in and look into the code, which I'll be honest for the most part, I haven't really been after using this, but it also helps the agent read the code and understand what's going on. Right? So this is another skill we'll be using as well. Now let's go back to cursor. We see that multiple sub agents using composer two five fast have been deployed, and it's going to be working on this plan. While the feature's working, I can open up Steam, and I've been I've been obsessed with, uh, Red Dead Redemption two again. I played it before. I finished it before. But for some reason, I don't know why, I just have this urge to play it again. So while this feature's working, we can play. So right now, I don't know if you could see, but I'm taking Jack Fishing.

00:10:06.975 --> 00:10:09.855
I think his name is Jack. He's John Martson's kid.

00:10:10.495 --> 00:10:46.800
And, yeah, we're gonna wait for cursor to cook, and I'm a play in the meantime. While AI is generating code, let me show you how you can get better at agentic engineering, and that's with today's sponsor. Before I introduce today's sponsor, let's hear from everyone's favorite CEO, Dario. Let's see what he has to say. I think I don't know. We might be six to twelve months away from when the model is doing most, maybe all of what SWEs do end to end. So we're six to twelve months away from all software engineering being done by agents. Yet if I go on Anthropix careers page and I select engineering and design for product, I see 20

00:10:47.040 --> 00:11:10.015
open roles. It's very important for us to understand that engineering is not dead. In fact, it's become more alive because generating code has become so much easier. That's why I highly recommend Scriba, the sponsor today's video, and their full stack developer path. If I was getting started today and I didn't wanna spend four years in college, I would take this exact path, the full stack developer path. You're gonna learn everything from HTML CSS

00:11:10.095 --> 00:11:14.330
to responsive design to setting up back ends using Node. Js databases,

00:11:14.330 --> 00:11:24.010
express UI design. You're gonna learn React. You're gonna learn advanced React. You're gonna learn about Next. Js, one of my favorite frameworks, although I find Svelte to be better. And here's what's interesting

00:11:24.170 --> 00:11:42.420
about Scrimba. Scrimba is the most unique platform and I'm not saying that because they sponsored this video, but I want to show you just a random video. So I'm gonna click play. At the moment, where we're making our fetch request So you hear the instructor talking. Right? And you see him screen sharing. But at the same time, this screen

00:11:42.660 --> 00:11:44.020
is an IDE

00:11:44.180 --> 00:11:45.300
in the browser.

00:11:45.460 --> 00:11:59.245
And then I could go back to watching a at the top. This is by far the most unique learning platform I've ever used. If wanna take advantage of the times than we're in, then there's no better way than learning how to write and architect and engineer,

00:11:59.325 --> 00:12:23.040
and the best way to get started is using Scribble. Make sure to check them out. The link is in the description. I highly recommend the full stack developer path. Now let's get back to the video. Okay. So plan complete. It says chat artifacts preview plan. Add a right side artifacts panel to slash dashboard slash chat that opens when the assistant creates substantial visual or reusable content. The first milestone should support markdown

00:12:22.725 --> 00:12:30.485
SVG and code artifacts safely. Framework component previews should be added through a follow-up sandbox compiler path rather than by running arbitrary

00:12:30.485 --> 00:12:32.325
React Svelte code directly

00:12:32.405 --> 00:12:35.605
in the main app. Okay. Cool. What I found, the main chat surface,

00:12:36.270 --> 00:12:57.075
it already handles list a direct gateway streaming, fallback schedule runs, active run preview recovery, message rendering, attachments, a desktop side sheet, and a sub agent inspector. Alright. So it's glazing my code. We love that. And this is the diagram we get user prompt, chat, run, row, gateway stream, checks if it needs to preview the content, artifact blog, parser. Okay.

00:13:03.075 --> 00:13:32.540
So one thing I noticed, there's a rollout strategy here. Um, it's planning on doing five PRs. Let me see. Are you planning on doing five separate PRs? Is it because it's going to be a large code change? Now I will say for Greptile to work for any code review to work properly, you're going to want to keep the PRs minimal. Right? If the PR is too large with too many files, it's going to be hard for any agent to review it. So minimal PRs that are to the point that fix a specific thing, chef's kiss. It says yes. That's my recommendation because it's fairly large cross cutting feature.

00:13:32.860 --> 00:13:37.820
Okay. But it doesn't have to be five PRs. A practical split would be back end data model plus

00:13:37.900 --> 00:13:38.860
parser

00:13:38.860 --> 00:13:43.500
and tests, PR two chat integration plus side panel. The five PR versions are most reviewable

00:13:43.685 --> 00:13:45.045
path. Honestly,

00:13:45.605 --> 00:14:01.330
I like the original plan of five PRs. I'd I'd rather keep it minimal and simple. Alright. It says sounds good. We'll keep the original five PR rollout. That's safer path for this feature because it lets us verify chat streaming and persistence before layering on preview UI and executable content.

00:14:01.330 --> 00:14:44.285
That's fair. Again, I would rather my PRs be minimal. I can test. I can verify things look good, and then I can move on to the next thing versus having this giant large PR. And I know you probably like, but you have a large one for the staging. Again, the staging is the place where I'm testing it. Right? I'm testing the feature. If something's not working, we're back to local, and then we'll merge back to staging. Right? But for stuff like this, I need to have multiple PRs, and that's what we're going to do, and we're gonna let this agent cook. Highly recommend this album, mixtape, whatever this is, fire. My favorite song, gen five or took a break. And, yeah, this is kind of the life of agentic engineering. It's like, it's it's just going and I'm just waiting, and I could maybe read a book, play a game, or work on another project.

00:14:44.525 --> 00:14:47.885
Also, side topic, I can't believe Arsenal

00:14:47.885 --> 00:14:48.525
won

00:14:48.820 --> 00:15:00.100
the Premier League. I can't believe like, I have been a proud Arsenal hater for basically all my life. I made it, uh, like, a a known thing that, like, one of my goals,

00:15:00.340 --> 00:15:17.605
um, as an avid soccer, uh, football fan is I I I give great joy watching Arsenal lose, the fact that they won the Premier League honestly breaks my heart. This is how you know Jesus is returning soon, that Arsenal winning the league, it we really are in the end times.

00:15:18.140 --> 00:15:32.940
Alright. So cursor is done with the task. We see here that it's implemented the chat artifacts preview plan. I'm not even gonna read all this. Let's go just test out the feature. Let's say, create an artifact that explains how World War two went,

00:15:33.475 --> 00:15:45.715
and let's just hit enter. So let's see. This is the first try. Again, probably might not work. It might work halfway. Let's see what we get from the agent. Oh, okay. So it is writing

00:15:45.510 --> 00:15:58.230
HTML. It's streaming HTML off rip. Probably not something I wanted to do. I probably wanted to just, like, say it's, you know, cooking. I don't wanna see the stream. But so far so good. It's working. Alright.

00:15:58.775 --> 00:16:04.135
Let's see. Oh, and by the way, the underlying model that I'm using is GLM

00:16:04.215 --> 00:16:07.655
five. Simply for a cost perspective, like, the cost

00:16:08.055 --> 00:16:34.725
to the type of knowledge you get is pretty high. Obviously, it's no Opus or, you know, GPT five five, but it'll do the job. And there you have it. We have our preview. Now, again, it is ugly because I'm using GLM five five. I know if I use Opus, it'll probably be very beautiful and chic, but I mean, it did it. Now there's a couple things from a product perspective. I I would love to be able to slide this right here. So I'm just gonna take a screenshot real quick.

00:16:35.125 --> 00:16:37.950
Let's copy this. Let's go back to cursor.

00:16:38.030 --> 00:17:01.495
Paste this, and I'm gonna say so you got it right. It works. But the one thing I'd love to be able to do is I'd love to be able to resize the panel, the window for the artifacts just like I can do with desktop. Literally look at the desktop resizing and just implement the same thing and hit enter. And, basically, what I mean by that is if I open the desktop oh, and that's probably something I should think about. When I open the desktop, you could see here I can

00:17:02.290 --> 00:17:10.210
resize this to my liking. But with this right here, it's just a fixed thing and I can download HTML if I want to. Now can I make changes?

00:17:10.370 --> 00:17:17.010
Let's see. Can you change the theme from like the light mode that it's into, uh, dark mode? Let's see

00:17:17.865 --> 00:17:19.465
if it can do that.

00:17:20.265 --> 00:17:21.225
Oh, okay.

00:17:21.625 --> 00:17:26.905
This is a known bug I have on the app. When I open the desktop and close it, there's, a routing issue.

00:17:27.145 --> 00:17:27.625
So

00:17:27.945 --> 00:17:28.825
assume

00:17:28.825 --> 00:17:39.720
that didn't happen. Embarrassing. I know, but that's another bug for another day. I'm gonna try again. Can you change the theme from light mode to dark mode? Let's see if it actually updates the existing artifact.

00:17:39.960 --> 00:17:52.815
Would be interested to see if it actually worked out the box. Okay. We can see the resizing has been added. Great. But I asked it to change it to dark mode, and it said it already was in dark mode. I sent it a screenshot,

00:17:52.975 --> 00:17:58.015
and it says, I see the issue. The artifact preview is still showing the light mode one. Let me emit.

00:17:58.415 --> 00:18:04.880
Uh, see, this is why the streaming is annoying. Okay. We're gonna fix that. I don't like it streaming the HTML. Let's go back here and say,

00:18:05.040 --> 00:18:11.360
when the HTML has been written, right now, we have is on the chat UI, it will stream.

00:18:11.680 --> 00:18:13.040
Can we just have,

00:18:13.600 --> 00:18:34.190
like, an animation that says, oh, like, you know, writing or building or actually, it should say something like writing HTML or crafting artifact. Actually, I like crafting artifact. Right? Crafting artifact, let it animate and pulsate nicely instead of the entire HTML streaming. So we'll have this queued up. Another thing that I noticed

00:18:34.750 --> 00:18:48.925
is okay. See, there you go. It worked. It says here, I see the issue. The artifact preview is still showing the light mode. Let me emit an updated version with the same key to refresh it. Another thing I noticed I created an artifact, and then I asked the agent to update

00:18:49.005 --> 00:18:50.285
existing artifact.

00:18:50.605 --> 00:19:06.820
And it updated it, I believe, but it did not show it. So can you please review that process and make it so that I can see every update? I also wanna see every older version. Right? Yeah. Make that happen. And then we're gonna hit next on this one. So we have these two queued up. We have

00:19:07.060 --> 00:19:08.820
this almost done, I believe.

00:19:09.300 --> 00:19:10.260
GVT55

00:19:10.260 --> 00:19:18.595
like it always does. It's writing a test. Tests are great. It's okay. We're gonna be happy with this. Now, we're getting to a point where this

00:19:18.835 --> 00:19:21.715
this looks pretty good. I like this feature.

00:19:21.955 --> 00:19:35.910
Now, I'm going to show you how in just a bit once these two are done, I'm gonna show you how I'm going to merge this into staging, and this is where Greptile is going to come in play. Some interesting findings here. I I can see the chat artifact instructions

00:19:35.910 --> 00:19:41.670
that it's generated. It says when creating a substantial standalone visual or reusable content emitted in an artifact fence,

00:19:42.115 --> 00:20:04.020
Use this exact opening fence shape. Open agent artifact type HTML title, short title key, stable Kavav key. Okay. Supported artifact type values are markdown HTML, SVG, and code. For code artifacts, include language TS or another short language ID when useful. When revising an existing artifact, reuse the same key so the update becomes

00:20:04.180 --> 00:20:10.595
a new version of the artifact. Put only the artifact source inside the fence. Continue conversational explanation

00:20:10.595 --> 00:20:13.955
outside the fence. That's pretty interesting. It says your artifact updates

00:20:14.115 --> 00:20:32.760
are returned with full version history. The side panel shows version history. Selecting an order version updates both preview and source views. New update defaults back to latest unless it's simplicity. Select an older version. The agent prompt now tells the model to reuse the same artifact, and we just read that. Let's see right here if we can see yep. We see v two v one,

00:20:33.000 --> 00:20:41.565
and let's say add World War one, uh, history in the same artifact as well. So let's make this a history document HTML.

00:20:41.645 --> 00:20:49.005
World War two was a global conflict that pitted the allied powers against the axis powers that began with Germany's invasion of Poland on 09/01/1939

00:20:49.005 --> 00:20:53.700
and ended with Japan surrender on 09/02/1945.

00:20:53.780 --> 00:21:00.020
And now we have that. Okay. So I don't need to see the streaming. I can just see that it's writing HTML.

00:21:00.260 --> 00:21:08.985
That's great. We have multiple different versions right here. I can close this. I can open this. I can resize this. I mean, I don't think there's much

00:21:09.225 --> 00:21:13.145
that we're missing. Now what's interesting is I don't think we followed

00:21:13.545 --> 00:21:15.465
this plan right here, this

00:21:16.340 --> 00:21:22.020
rollout plan. We'll see. I'm gonna ask you to push this to a branch and make a PR to staging.

00:21:22.180 --> 00:21:47.910
But here's one thing I do wanna say. I don't necessarily create the plan for the agent, although I do think it helps. There are times where I'll just build the feature going back and forth with it. The plan sometimes and actually, most of the time is really for me because I'll work on multiple features at a time, and I need to remember what it is that I was working on or what it is that me and the agent were working on. So low key, it actually helps me. I'm pro plan for myself, but I also use it with the agent,

00:21:48.230 --> 00:21:52.070
more so myself, to be honest with you. Now let's go back here.

00:21:52.390 --> 00:21:55.350
The update has been made. We can see version three active,

00:21:56.085 --> 00:22:21.750
and, yeah, we see World War one and then World War two. So this feature is pretty much done. I really like it. I I thought we'd have more issues, uh, family. GPT 5.5 extra high. Fast is amazing. So let's clean up. I want you to push this to a new branch, and from that branch, create a PR to staging. We're not going to merge to main. We're gonna merge to staging. So push the branch, create a PR,

00:22:21.910 --> 00:22:41.270
and give me the PR link. I'm gonna hit enter. So now what's going to happen is it's it's because I have GitHub connected, it's going to create a new branch, push that app branch, create a PR. It's going to give me the PR link, and then we're gonna review the PR, and we're gonna see what score Greptile gives us. Oh, and by the way, for the tech stack, I am using SvelteKit.

00:22:41.270 --> 00:22:44.710
This is a full Svelte app. You don't believe me? Let me

00:22:45.350 --> 00:22:46.150
open

00:22:46.390 --> 00:22:49.670
let me open there you go. They have dot Svelte file.

00:22:49.990 --> 00:22:50.870
You have

00:22:51.190 --> 00:23:02.705
oh, it's also not only just a web app. There's also a desktop app using Electron for that. There's a web app, and then there's an admin dashboard to manage admin stuff using Svelte to power everything.

00:23:02.945 --> 00:23:06.865
Convex, best back end in the world. Convex literally orchestrates

00:23:07.130 --> 00:23:12.170
everything. Deploying this on Daytona. Daytona is the best agent

00:23:12.410 --> 00:23:19.930
cloud provider. I used a bunch of them. Fell in love with Daytona. And there's a couple other tools like super memory for memory, agent mail for mail,

00:23:20.490 --> 00:23:25.385
Plaid for the financial stuff, Twilio for the phone. So really incorporating

00:23:25.385 --> 00:23:28.425
a lot of services, creating these very composable

00:23:28.425 --> 00:23:45.610
service layer abstractions so that each service connects to a specific thing, and I can find the code easily. So this is a very this project, in my opinion, is a very well thought of agentic engineered project. It's not perfect by any means, but it's pretty dang good. So let's open this PR.

00:23:45.770 --> 00:23:51.915
I can view it in Cursor's PR viewer, but I'm gonna be honest. I am going to go on GitHub.

00:23:51.915 --> 00:24:12.050
Let's go on GitHub. But Cursor's is pretty nice too. It's just not real time, meaning, like, when an update pushes, I have to click, like, refresh here to make sure I see it. But let's go back here. We could see Greptile is fired off. We do have CI pipeline. I'll explain that maybe in another video if you're interested, but now we see 2,000 lines added, 13 removed.

00:24:12.370 --> 00:24:14.530
Great summary written by cursor.

00:24:14.530 --> 00:24:23.545
We're just gonna wait on the Greptile review, and we're gonna see what we get. Alright. The review is here. And ladies and gents, we got a three out of five confidence

00:24:23.545 --> 00:24:28.345
score. Let's see why. This PR adds a full chat scope artifact system,

00:24:28.665 --> 00:24:35.290
fence parser, convicts persistent versioning, and resizable side panel with save preview rendering for markdown.

00:24:35.690 --> 00:24:51.335
Let's see. Okay. It's explaining. Let's let's see the issues. Okay. This is security issue. The artifact persistence and rendering pipeline is well structured, but the message card matching logic has a defect that can surface draft content under past messages during active stream runs.

00:24:51.575 --> 00:24:52.695
The artifact

00:24:52.695 --> 00:24:54.215
cards for message

00:24:54.375 --> 00:24:56.615
function contains a matching condition

00:24:56.775 --> 00:25:07.000
that can cause past message artifact cards to resolve to the current streaming draft when draft shares on artifact key with a message already persisted. Because chart artifacts

00:25:07.000 --> 00:25:39.470
removes the persistent copy in favor of the draft, past messages lose their correct historical reference and instead show a live incomplete content. Oh, this makes sense. Visible to any user. This makes sense. And then there's some security stuff. Now, usually, you get these comments. Right? And these comments basically tell you where the issue is, and you can copy the prompt to fix. Where the issue is, sometimes you'll get commit suggestions where it will commit the message for you, but usually, you can just copy the prompt to fix. Now here is where grep loop comes in. I'm gonna go to cursor. I'm gonna do slash

00:25:39.870 --> 00:25:41.470
grep loop,

00:25:42.110 --> 00:25:47.435
and we're gonna hit enter. Now what's going to happen is what I explained to you earlier.

00:25:47.915 --> 00:25:49.195
I push the change.

00:25:49.595 --> 00:25:51.115
I got a three out of five.

00:25:51.515 --> 00:25:52.955
I fired greploop.

00:25:52.955 --> 00:25:58.955
Greploop is gonna read the feedback. It's going to make changes. The cursor agent's gonna make changes. Push

00:25:59.490 --> 00:26:00.450
to GitHub.

00:26:00.610 --> 00:26:08.610
Rereview if it's a four out of five, back to cursor cursor updates. And then when it's a five out of five or there have been five turns,

00:26:08.930 --> 00:26:11.970
then it stops. So this is,

00:26:12.210 --> 00:26:14.585
my process. Build the functionality,

00:26:14.585 --> 00:26:27.145
test it, you know, actually see if it works. It worked, but there's some edge cases we can't catch off an initial use. Then we fire that off to greptile. Greptile gives us a review. There's some security things we missed as well. Slash greplu

00:26:27.270 --> 00:26:32.390
starts cooking. So you can see here, it says, greptile left three actual comments,

00:26:32.550 --> 00:26:38.150
one real draft leak bug, one sandbox tightening, and one small cleanup around an identity helper.

00:26:38.150 --> 00:26:46.235
I'm gonna patch those, update the affected source test, run the focused artifact test, commit, push, and then trigger the next greptile iteration.

00:26:46.475 --> 00:26:48.635
This is where slash greploop

00:26:48.635 --> 00:26:54.475
works and cooks. And now, I'm probably gonna go grab some meat. I'll be right back. Got some, uh, pasta

00:26:54.475 --> 00:26:59.580
cream sauce. Let's see if our grep review has changed. Let's refresh.

00:26:59.820 --> 00:27:34.290
It actually pushed the change, and now you see, I didn't even write that. The grep loop did it. So it fired, you know, GitHub's API and wrote add grep tile review. And whenever grep tile, like, drops this emoji, that means it's reviewing the code changes, and you can see a review started a minute ago. In a couple minutes, we'll see if this is a five out of five or four out of five. Sometimes, I'm not gonna lie, especially if the PR is big, it might even degrade. So let's see what we get. Alright. So we got an update and we got a four out of five. It says it's safe to merge with the iframe error detection gap address before shipping

00:27:34.530 --> 00:27:54.725
the repair workflow to users. It tags this specific file and says the on error event wiring needs a different approach. Example, post message from inside the frame to actually surface rendering errors to HTML and SVG artifacts. So it again, it addressed it like, see, this review is complete, got a thumbs up, and then now we have this one comment.

00:27:54.885 --> 00:28:02.030
And, again, I can copy this, paste it, and then tag at greptile review for a new review after push has been made,

00:28:02.430 --> 00:28:09.230
or I can just wait on grep loop to continue to cook. So notice this, we're literally following the same trajectory.

00:28:09.310 --> 00:28:12.590
Went from, in this case, three out of five to

00:28:12.965 --> 00:28:14.165
four out of five.

00:28:14.725 --> 00:28:15.685
Now hopefully,

00:28:15.685 --> 00:28:27.525
next we go five out of five. Again, there are times where it will get stuck at four out of five. If I notice it going in a continuous cycle, I'll probably stop. I'll review myself and I'll just merge.

00:28:28.000 --> 00:28:36.720
Right? Because you don't want the agent to keep editing, editing, editing, and then it's gonna start hallucinating and making stuff up. You know, short, simple, concise,

00:28:37.040 --> 00:28:38.720
to the point, not too long,

00:28:38.960 --> 00:29:04.060
that's the sauce that I've seen success with. Now it's fixing up that edge case. If I click here, can see the changes it's making, and I just gotta wait. I can work on another project or, um, I'm a play a little bit of Red Dead Redemption. Alright. So we got a three out of five. It says you're safe to merge for markdown and code artifact. HTML SVG artifact preview will silently show non interactive content due to sandbox configuration, the version history query

00:29:04.060 --> 00:29:06.940
could become a bandwidth concern for active chats.

00:29:07.180 --> 00:29:35.590
Right? And we get some feedback here. Now I could just fire off the grep loop here. Right? And to fire off the grep loop, all I would do is like slash grep loop. But in my humble opinion, this PR is a little too big. Right? It's over 2,000 lines. So what I'm gonna do is as follows. I'm gonna go to cursor and say the PR has been made. We got a three out of five on a Reptile, but the PR feels a little too big for the Reptile agent to be able to capture everything. Uh, what do you think about splitting the PR into smaller chunks

00:29:35.750 --> 00:29:41.885
that makes sense so we can get Reptile to review the code and we can merge it safely? And we're gonna hit enter

00:29:42.125 --> 00:29:42.845
and

00:29:43.085 --> 00:29:48.605
the goal is to at least get this to a couple 100 lines each, maybe even if it's thousand,

00:29:48.765 --> 00:30:02.310
that's fair. But I feel like 2,000 lines is just a pretty big PR and I don't wanna get into the cycle of like, Greptile keeps catching issues because, again, code base like, the PR is just large. Right? So let's try to make it smaller.

00:30:02.390 --> 00:30:28.050
And if you're an engineer and you've worked in the engineer org, you know, you know, the smaller the PR, the more focused the PR, the better your life is, and I think the same applies to the agent as well. Alright. So we got a response that says, yes. I think splitting up is the right move. This PR makes this parser contract, convict scheme, secure rendering, and a large UI integration. Greptile do better if each PR has one review surface, And the suggestion is four PRs, add chartered effect, fence contract.

00:30:28.210 --> 00:30:28.770
Okay.

00:30:29.090 --> 00:30:32.530
And then the artifact persistence. Okay. Preview.

00:30:32.610 --> 00:30:40.075
Okay. This all sounds good to me. I'd like to keep this as stacked PRs rather than independent branches because later pieces genuinely depend on each other, and I'll be like,

00:30:40.635 --> 00:30:56.840
do it. Looks good. This looks like a genuine good plan. G p t five five extra high fast for it to win. It sounds like a Starbucks order, and let's see what it generates. So the PRs have been split. I have four PRs here, And if I open up on my browser, have PR one, two, three, four,

00:30:57.080 --> 00:31:26.830
all under a thousand lines code. It's gonna be much easier for the Greptile agent to review and for us to deploy a fix using greploop. Alright. So the reviews came in and every single one got a three out of five. Every single. So this is great. So let's see the issues here. It says, safer basic single block artifact with markdown artifacts containing nested code fences will silently be truncated at the first inning closing fence. The closing fence rejects matches any bare triple backtick line, so markdown artifact embedding a fenced code snippet

00:31:26.830 --> 00:31:44.335
will have its content cut off at the inner fence with no error. Okay. So this is pretty good catch. It gives us some feedback. Let's use the grep loop. Now, we're on p r 87. Remember, we have a stacked p r here, so we have a number of them. We wanna fix eighty seven first. So let's go back here and let's say,

00:31:44.655 --> 00:31:45.055
please

00:31:45.855 --> 00:31:46.495
review

00:31:46.735 --> 00:31:53.430
actually, no. I don't need to do that. I'm gonna do slash grep loop, and I'm just gonna say p r 87.

00:31:53.590 --> 00:31:54.630
There you go. A little gold

00:31:55.270 --> 00:32:02.790
text right there, p r 87. So now what's going to happen is the cursor agent knows to run the grep loop on p r 87.

00:32:02.790 --> 00:32:09.495
It's going to read the contents. Again, I'll show you this diagram I drew earlier. It's going to read the review, read the feedback,

00:32:09.895 --> 00:32:55.230
fix it, push a change. That change push is going to call the Greptile review again, and then it won't stop until it gets a five out of five or it's taking five turns, whichever one comes first. So in this case, we see it says here, grab title left three actionable comments on the parser contract. It points that out. It's thinking right now. It's going to push the fixes, and it's going to rereview it, and it won't stop again till that either gets a five out of five or it has five turns. So you probably noticed a shirt change. I had day job work I had to do. I had to take care of my lady. I got a little busy, but guess what? We got our grep loops done. So instead of boring you with the details, I'm just gonna show you what I did. We grep looped each PR. We did grep loop 87.

00:32:55.230 --> 00:33:24.530
Once I got a five out of five, merged it. Then we went to 88, grep loop. Once I got a five out of five, merged it. 89, grep loop, five out of five merged it, 90 merged it. We did that and you can see here five out of five here, five out of five here and all of it has been merged to staging. Now, what we have left is to actually test this thing. So let's go in open chat. Let me just refresh real quick. Open chat. Let's say, can you create

00:33:24.690 --> 00:33:28.610
an artifact sharing the best restaurants in Toronto?

00:33:29.250 --> 00:33:30.690
So hopefully,

00:33:30.770 --> 00:33:34.325
this works. And if it doesn't, we're gonna debug together. But if it does,

00:33:34.645 --> 00:33:35.445
we cooked.

00:33:35.605 --> 00:33:46.805
Alright. So what happens is it says here, I'll delegate this research task to a sub agent then can gather the current information about Toronto. So it spawned a sub agent and I can click on open inspector here

00:33:47.150 --> 00:33:58.670
and see what's going on. Basically, it saw that, okay, I'm gonna need to do some research for this and instead of blocking the main thread, I'm gonna deploy subagent. The reason why this is cool is I can say

00:33:58.750 --> 00:33:59.550
what's

00:33:59.630 --> 00:34:04.475
up with you today. Just being a weirdo talking to AI like a friend.

00:34:04.955 --> 00:34:07.915
And the main thread is not blocked because

00:34:08.155 --> 00:34:32.085
this has been given off to a sub agent. And I can chat with the main agent and you can see here it responded to me saying, hey. I'm doing well. And it told me it has a sub agent running and I can get it to do other things which is pretty cool. I do talk about it in this video, my agent is better than Claude Cowork on how I architected the agent and how like the sub agent stuff works. So check it out. It's literally like the fourth video on my channel. And if we go back here to the sub agent, we see that I got good data from both,

00:34:32.485 --> 00:34:48.590
uh, condoness traveler and timeout Toronto. Let me fetch a few more sources to get a comprehensive list. So it's doing its research finding the best restaurants in Toronto. And we can see the artifact here. Now mind you, not the prettiest one, pretty ugly, and that's probably because I'm using GLM five,

00:34:48.910 --> 00:34:53.230
but it got it done. It worked. We finished building the feature,

00:34:53.470 --> 00:34:58.510
and it was all because of this simple workflow where I use GPT 5.5 extra high, fast.

00:34:59.165 --> 00:35:00.525
We have Greptiles,

00:35:00.525 --> 00:35:01.725
Greploop skill.

00:35:01.805 --> 00:35:02.445
Right?

00:35:02.605 --> 00:35:22.430
Minimal PRs. Right? We don't want the PR to be too big. We want them to be minimal. And just a little back and forth and a little structure gets you a long way. Now something I'm going to do, and I I talked about this earlier, is I won't show it in this video, but I'll probably run this skill right after just so it can clean up the code and we have this nicely tidied

00:35:22.430 --> 00:35:24.110
documented functions

00:35:23.775 --> 00:35:52.470
where I know exactly where artifacts are and the agents know where artifacts are. And that's pretty much it. This is how you do agentic engineering. At least this is how I do it, ladies and gents. I hope you found value in this. I know this is rather a long video. Let me know if you like stuff like this. Every time, you know, I hop on podcasts or other people's channels and I share this stuff, people seem to really like it and I never really done it on my channel. So let me know your thoughts down below. Would really appreciate a like, a comment, and subscribe. Thank you so much for watching this video. I'll see you in the next one. Peace.