WEBVTT

00:00:00.000 --> 00:00:15.040
What if you can run Claude code a 100 times cheaper than whatever you're using right now? Now this is Claude code, but the model under the hood, it actually isn't even Claude. It's DeepSeek v four, the newly released open source model that's essentially just trading blows with Claude on any coding benchmarks.

00:00:15.415 --> 00:00:28.375
And this is all with just a two line swap, and it's a five minute setup. And it basically cuts the bill by a 100 x. So I just had to build me this entire dashboard from one prompt. Basically, just one shot at it, and it's a polished design. We have all the live calculations,

00:00:28.930 --> 00:01:11.665
and it took about three minutes from start to finish. This is the kind of output that you would be expecting from one of the top tier models like Opus. Now if we were to take this and do this on Opus, this would have cost me, you know, 10 times the amount. I ran this for about 2¢. Is This the price for a stick of gum. Now in the next ten minutes, I'm gonna be showing you the one prompt that does the entire install, the workflow that cut my Claude built in half, and most importantly, the specific kind of work where this whole setup actually does fall apart because nobody's being honest about that. So two questions you're probably already asking about this. Is this even allowed? And what's the catch? So to first answer that, yes. It is allowed. DeepSeek, they officially documented this themselves. They even published the endpoint. They published the exact environment variables, and they wrote the instructions for actually connecting their models into ClawdCode,

00:01:11.825 --> 00:01:13.425
into OpenClaw,

00:01:13.425 --> 00:01:49.630
and really anything else. They just want people to actually start doing this. Now the catch is the whole reason this video actually exists. So there is one specific category of work where, you know, this entire thing is breaking. And the people who got burned trying it earlier this year, they are why most operators still pay full price, and I will get you there. Just stick with me through the entire build. So just real quick on why the swap actually works right now because, you know, just about three months ago, it didn't. So all these open source coding models, they were very close, but you would just honestly fill the gap on anything past a boilerplate. But that recently changed in April when Deepsea v four dropped. You know, we have the open weights, MIT license,

00:01:49.630 --> 00:01:59.310
you know, that just means anyone can self host it. They can fine tune it or just run it locally instead of being, you know, behind a single vendor's closed API.

00:01:59.550 --> 00:02:11.145
And on the benchmarks that actually matters for coding, it finally caught up with all of this. So for example, on the SWE bench verified v four pro, it scored in the 80% range. You know, that's the same neighborhood as Sonnet 4.6

00:02:11.145 --> 00:02:12.345
and Opus 4.7.

00:02:12.345 --> 00:02:56.885
Now on coding tests, v four is basically Sonnet, and on any hard reasoning, it's about 80% of OPUS. So for most of what Claude code actually does day to day, that gap doesn't really show up whatsoever. Now when it comes to the pricing, OPUS 4.7, it's $5 per million input tokens and $25 per million output. But DeepSea's v four flash, it's 14¢ per million input, 28¢ per million output. In the gap, it really will be showing up quite fast for you, pretty much the moment that you actually start running real workloads for this. I quite literally run three AI companies, and I've been pointing Claude code at this DeepSeek endpoint for the last couple of weeks across all three. And we're shipping the same code that I was shipping, you know, last quarter on Anthropix API. Now the difference is in actually comparing these two on any routine work, it's imperceptible,

00:02:56.885 --> 00:03:27.695
but the bill, however, is not. Now just a quick note, what I'm actually using is to run this entire thing. My laptop, it's powered by Snapdragon, which is effectively what's running this entire stack. And this video, it's actually sponsored by Snapdragon. So link will be down below in the description to check them out, but I've had this over a couple of weeks now, and I don't throttle on any long coding sessions. My battery lasts all day, and the cost I'm about to walk through, it plays even better when the hardware is not burning power either. Now anyways, for step one, we're just going to navigate over into deepsea.com.

00:03:27.695 --> 00:05:02.370
If you don't have an account already, just make sure it's to go ahead and sign up. It's completely free to make, so you only have to actually pay for what you use. And I dropped in about $5 just a couple of weeks ago, and I've barely touched this. But anyways, once you have signed up for your account, go ahead and navigate over into your API keys, create a new key. We will need this in a second. Now once we actually grab this, it's gonna get a little more interesting because we're just gonna use Cloud Code to actually install everything else, make it extremely simple for us to start using. Now once we do have our API key, we're just going to utilize Cloud Code to actually install everything for us. Now quick thing that I do want to flag before we actually keep going is that DeepSeek, they officially launched their own documentation showing exactly how to do this. So again, just how to actually connect their models to Cloud Code, and this is not a workaround. And this isn't some sort of like sketchy workaround or anything like that. This is not a hack. They wrote the instructions themselves. This is official. So you can see like this is their page. The official method is to copy six lines of code and paste them into a config file on your laptop. It's a little bit technical, but it does work. Link will be down in the description if you want us to it that way, but let me show you how to actually do it the easiest way. Oh, and one more thing worth knowing is that the same swap, it also works if you're using OpenCode or OpenClaw or Hermes, and DeepSeek officially supports all three of these. So we're staying on ClawdCode today, but the same idea carries over if your tool choice is going to be different. A pasting environment variables into shell config files is the kind of thing that makes operators, you know, just bounce off tutorials. I'm gonna show you a friend of your path that uses a tool. Alright. So we have Clog code running inside of our IDE. I'm using Versus code. You can use whatever you want to actually utilize this. You can use the terminal. You could also use the regular desktop application

00:05:02.530 --> 00:05:06.850
of Claude, but we're just going to paste this in. So what I'm saying specifically

00:05:06.930 --> 00:05:50.885
is set up DeepSeek as my Claude code provider using the official DeepSeek documentation method. So I'm effectively just copying what is in documentation of DeepSeek. And more importantly, we're going to leverage the power of Cloud Code to make it as simple as possible for us to actually install this system and make it a 100 times cheaper to actually be using Cloud Code moving forward. Now one important thing before we actually do send this off is that key that we actually copied from DeepSeek earlier. This is where we need to replace it. So right here, we can just replace this with our API key. Now we're just going to press enter, run this off, and really, we're just telling Clog code to do three different things in one prompt. First, to just go check my shell config file. So that's my file computer,

00:05:51.045 --> 00:06:01.700
which is going to actually be reading every time that I open a brand new terminal. And from there, it's going to just clean up any leftover deep seek settings from any older experiments.

00:06:01.700 --> 00:06:18.315
So we're just going to be starting fresh. So it's asking me how do you want to configure DeepSeek as your quad code provider on Windows. I'm just going to say a PowerShell profile. Now it's asking, did you mean this for WSL or Linux machine instead of this Windows host? I'm gonna say no configure this Windows machine,

00:06:18.475 --> 00:06:39.190
and we'll run that from there. It'll ask for a few more permissions that you wanna proceed with this PowerShell. I'll just go ahead and say yes, but you'll also notice in the prompt that I was asking it to present me the results so that I can just confirm that everything actually worked. Alright. So everything just finished installing. We got it all wired in. Now let's actually get started using this. But I do wanna say that when quad code edits your

00:06:39.510 --> 00:07:17.245
shell config, the terminal that you're currently running this in, it's not going to be picking up the new settings. So whatever you just installed, only a brand new shell will be picking that up. So instead of trying to refresh this specific window that we have right here, we just have to just close completely, open a fresh new project folder. It's gonna be a lot cleaner that way. So I went ahead and created a brand new folder. We've got a fresh window. We got a fresh shell. Now what I'm going to do is I'm just going to open up the terminal, and we're just going to run Claude code. So I'm just going to type out Claude. From here, if we actually expand on this a little bit, you can see it's automatically going to start up Claude for us. So we could run this in the dark mode,

00:07:17.485 --> 00:08:02.980
and we'll just press enter. Press enter again and use the recommended settings. Go ahead and trust this folder. Now we can see Claude code is fully loaded up, and we could expand this as much as we want. And if I just type out model again, we'll be able to pull up all the different models that we will have listed out. So you could see we have the defaults, of course, Haiku, and then we have down at the bottom, the most important DeepSeek chat. Currently, it's already selected by default because that's what we set in, uh, config. So most of the day, I'm just gonna be on DeepSeek, and I don't really think about it. And the bill automatically stays very low for us. If I just back out and just say, what model are we running on? You'll see what we get. We're running on DeepSeek chat. And if I just type out slash usage, we'll actually be able to see all the different models that we are running specifically

00:08:03.220 --> 00:08:06.100
and what's actually getting the charge. So DeepSeek,

00:08:06.515 --> 00:08:07.395
24,000

00:08:07.395 --> 00:08:08.195
inputs,

00:08:08.195 --> 00:08:23.420
about 400 outputs. And as you see, I've only spent about 13¢ on this. Anyways, the setup is completely done. Now let's get into some real stuff. So some quick context before I actually show you the build because this is the actual workflow that I have been running where most of the day, I'm going to be on DeepSeek,

00:08:23.500 --> 00:08:26.060
ClaudeCode. It looks completely identical.

00:08:26.060 --> 00:08:35.740
The work's going to get done. Our bill is going to stay very, very low. And when I hit something that is going to be heavier where I do need more complex demands from ClaudeCode,

00:08:35.925 --> 00:08:41.045
that is just where I can actually, you know, have a complex multi file refactor.

00:08:41.205 --> 00:08:46.325
I can have a vision task, something that just needs deeper reasoning. I just hit slash model,

00:08:46.325 --> 00:08:49.845
and I can so easily just flip between utilizing

00:08:49.845 --> 00:09:12.245
the default Opus 4.7. So that's gonna be the whole rhythm within this. The default, it's gonna be DeepSeek chat. And then when we do need something a little bit more hands on, then we'll be utilizing the default Opus 4.7. Now to show you that DeepSeek is genuinely capable of very real work, not just, you know, toy examples. I'm going to give it just one prompt right now and have it build me a real working dashboard.

00:09:12.325 --> 00:09:41.535
I'm gonna have it build out a calculator that figures out how much you save by routing routine work to DeepSeek. So this is the most on theme demo that I could actually think of for this I'm just gonna back out of this, and what we're going to paste in is this prompt right here. I'm just gonna run it off. Build me a single page interactive dashboard called DeepSeek versus Cloud Code ROI calculator as just one index HTML file. So it should take three user inputs. So the monthly Claude code spend in US dollars, the percentage of tasks that are routine slash boilerplates,

00:09:41.695 --> 00:10:26.000
and then the average deep sea cost ratio versus Claude and defaults to one out of a 100. Now number two is the live calculate and display. So just give me the monthly savings if it is routine work running to deep sea, the annual savings, the five year savings, and a simple bar chart comparing current versus the per monthly or the new monthly cost instead. Number three is just to use a clean modern design, and I'm just getting a little bit picky about how I want it to actually look. And then make sure that all values update live as inputs change. No submit button. So we scroll down just a little bit. We can see it already finished up generating this. I mean, that took how long exactly? That was twenty eight seconds. So let's open this up and we can before we do that, we could see what it does. So the monthly spend, the routine task slider, and the cost ratio.

00:10:26.240 --> 00:10:33.645
So I'm gonna open this up real quick inside of our folder. Cool. So here we are. The DeepSeek versus Claude code RI calculator.

00:10:33.645 --> 00:10:38.925
So our monthly Claude code spend, we're just on the $200 subscription. So the routine slash boilerplate

00:10:38.925 --> 00:10:49.140
tasks, we have about 60%, and we could also configure this as well. Then we have the DeepSeek cost ratio versus Claude. Now the monthly savings, so it's about a $119 versus $81.

00:10:49.300 --> 00:10:52.020
Annual savings, it's gonna be about $1,400,

00:10:52.020 --> 00:10:54.500
and the five year savings, $7,000.

00:10:54.500 --> 00:10:59.295
Let's scroll down a little bit further and we can actually see the bar chart. So this is going to compare,

00:10:59.375 --> 00:11:00.015
um,

00:11:00.335 --> 00:11:17.070
Claude plus DeepSeek. And then we scroll down, we can see the bar chart just comparing current, which is just all Claude versus a hybrid of Claude plus DeepSeek. Now to generate something like this even just a couple of months ago, I mean, we simply would not have been able to create this unless I was using some sort of supercomputer

00:11:17.310 --> 00:12:35.035
and honestly, like none of the open source models, they just were not there yet, but I genuinely think we are there right now. Now earlier in this video, mentioned I was using a Snapdragon chip and that is quite literally what makes my workdays possible using something like this. It is what these chips do with AI workloads. So right now as I'm filming this, about seven hours into the workday, Clog code, it has been running builds all in the background, terminals open all over the place, and all the fans, they haven't even spun up once. And the part that actually surprised me the most is it's not just Claude code that's actually staying quiet, it's everything at one. I have 10 different terminals open. I have multiple Claude code sessions running in parallel in a video rendering on top of that and a browser windows just everywhere. And the chip, it literally absorbs all of this like a bulletproof vest. And the reason for that is because Snapdragon, it has a dedicated AI engine just built directly into the silicon. The So heavy AI inference work, it doesn't fight the CPU for any power. The CPU, it's instead staying free for the rest of your work. And that matters for the cost story in this video specifically because the API bill is only half of it. And other half, it's just a hardware actually running underneath that. So if your laptop is just overheating and burning battery every time you run a build like this, you stop running them and then you batch your work, you wait until you're plugged in and you just avoid the long sessions and you're just not leveraging and capitalizing,

00:12:35.195 --> 00:12:40.630
um, you know, all the tools at our disposal such as AI. So overall, that's lost productivity,

00:12:40.790 --> 00:12:49.430
lost time and lost money for you in your pockets. But anyways, going back to this specific build, I mean, can change this monthly clot code spend, maybe it's gonna be about $300.

00:12:49.510 --> 00:12:54.675
I usually don't spend more than $300 a month and you can see the savings. About a $170

00:12:54.675 --> 00:12:56.355
versus a 122,

00:12:56.435 --> 00:13:37.420
the annual savings, and of course, the ten year savings as well. Now if we go back inside of DeepSeek, you could see I literally only used 1¢ to build that dashboard. Now this is why I actually defaults to this for most of my work because it's not just adequate. It's actually very, very good. Now anyways, I do wanna get honest and cover some of the limitations when it comes to actually going with this approach. So from my experience, I have been just routing my quad code work through DeepSeek for about three weeks now. I love it for what it actually does well, but I'm also gonna tell you exactly what it doesn't do because the operators, the builders, who are actually getting burned trying this early in the year, they got burned on these specific things, and I would rather you know right now. So four things to be flagging. First one I could actually show you on the screen right now,

00:13:37.820 --> 00:13:43.660
it's the MCP servers. So if you use quad code seriously, you've probably been wiring up MCP integrations.

00:13:43.965 --> 00:13:48.845
You have, like, file system access, linear, notion, whatever it may be. MCP,

00:13:48.845 --> 00:13:59.380
it's going to be the protocol that lets ClaudeCode actually reach out and talk to all of your tools. So it's a big part of why ClaudeCode is, you know, powerful for real work. Watch what happens when I actually check MCP

00:13:59.380 --> 00:15:09.320
while routed through DeepSeek. I'm gonna type out slash MCP, and it's going to say no MCP servers configured. Please run slash doctor if this is unexpected. So we're getting nothing, and the reason isn't that I forgot to configure any. It's just that DeepSeek's endpoint does not support the MCP protocol whatsoever. So DeepSeek, they actually officially documented this in their API compatibility table where MCP calls, they're flagged as just being ignored. So they're not broken. They're not throwing errors. It just quietly dropped it on the floor and they didn't really like regard it whatsoever. So even if you had 10 MCP servers wired up and working perfectly through Anthropic, the moment you start using DeepSeek, you know, it's not gonna fire any of those. That's why I would be switching back to Opus the second that I am in MCP territory, or you can just go to do something else like utilizing Sonnet or Haiku. You don't need to kill every single process with, you know, high premium model like Opus. Now number two, it's going to be vision. So if you've ever pasted a screenshot into Claude code just expecting the model to read what's on the screen, you know, like debug a UI bug from, a screenshot, pull data out of a chart or, like, looking at a design, like, capability, it lives on Anthropix side and DeepSeek's coding endpoint.

00:15:09.745 --> 00:15:22.065
It doesn't process images whatsoever. So anytime my work involves looking at something visual, I have to flip to a different model from Anthropic like Sonnet or Opus. Number three is the prompt caching. So this is like the quietest

00:15:21.920 --> 00:16:33.520
I would say, of all the four. So in Thropic, it gives you a discount when you actually reuse long system prompts across sessions because they cache the prefix on their end. Right? So for most of what you and I do day to day, that doesn't really matter where each session, it's gonna be completely fresh. But if you run agent loops with massive system prompts that are just going to be repeating all day every day, like maybe an SDR agent or a background worker hitting the API every single minute, the Anthropic cash discount can quietly become bigger than the DeepSeek savings on those exact workloads. So it's definitely worth modeling all of this out before you actually commit a heavy production workload. Now number four, and this is the one that I've actually felt the most in my actual work, is the multi file debugging across a large code base. So when you are three layers deep trying to just figure out why a request is going to so for example, if you're three layers deep trying to figure out why a request is failing in a multi service repo for you, DeepSeek, it usually needs two or three follow-up prompts where Sonnet would often just one shot it for you. So the token savings, it does vanish on tasks like that because you are having to re prompt, you know, four times instead of just the one shot. So whenever I do hit deep multi file debugging,

00:16:33.600 --> 00:16:55.150
I flip to Opus, I do that part, I flip right back. If you take one thing away from this video, do not switch entirely. Don't just go straight into DeepSeek as I showed you reasons why. You want to be using the model picker the way that I have been using it. So your default, it points quad code at DeepSeek. That's your everyday setup and most of your work, it runs just right through it. Your bill, it's going to be dropping significantly

00:16:55.150 --> 00:17:12.725
and when something hard actually breaks like a multi file bug or a vision task, an MCP heavy workflow, you just hit slash model, pick out Opus or Sonnet, whatever it may be and you can flip to that, flip back later. So with that being said, is only going to be applicable if you are going to be utilizing the API.

00:17:13.045 --> 00:18:13.955
So some of you, you do utilize a cloud subscription. So in that case, it's not going to matter as much unless you're constantly hitting your limits. So with that being said, just stop paying full price for Cloud when most of your work, it does not need it. So that's the real unlock. So if you're an operator running automations or any internal tooling and Cloud Code, this literally turns a $200 monthly pilot into a $20 pilot. So the whole thing that was just too expensive to leave running around the clock just got cheap enough to actually leave running around the clock, and that's the real shift in what is actually worth building right now. But anyways, just a quick thanks again to Snapdragon for sponsoring this video and for the laptop that is been powering the entire build that you just watched. So link will be down below in the description if you wanna check it out. Highly recommend them. But any case, thank you guys for watching. If you are interested in getting more hands on approach, learning all stuff like this right when it drops, then make sure to check out our free school community and also our weekly AI newsletter. And if you're a business owner looking to implement AI into your business in 2026, then make sure to book in a call with our team. Link will be down below in the description. But, again, thank you guys for watching. I'll see you in the next video.
