Ray Fernando · Youtube · 18:51

Two MCPs That Save 97% of Your Context Window

An ex-Apple engineer benchmarks ref.tools and Exa AI against Cursor on a live Tailwind v4 refactor — and Claude Code wins at 2,800 tokens vs 98,000.

Posted

November 23rd 2025

5 months ago

Duration

18:51

Format

Tutorial

educational

Channel

RF

Ray Fernando

§ 01 · The Hook

The bait, then the rug-pull.

Documentation used to eat 100,000 tokens before a single line of code was written. Ray Fernando — ex-Apple engineer, decade-plus shipping production software — found the fix in two MCP servers that pull only the context the agent needs, precisely when it needs it. The proof is live on screen: a full Tailwind v4 design-token audit across a real codebase clocks in at 2,800 tokens — 1.4% of what Cursor burned on the same task.

§ · Stated Promise

What the video promised.

stated at 00:02 "I'm gonna show you how to get two of my favorite MCPs set up." delivered at 16:50

§ · Chapters

Where the time goes.

00:00 – 02:05

01 · The context rot problem

Ray introduces the concept of context rot — LLMs getting dumb as the window fills with irrelevant documentation. Establishes why targeted doc-fetching MCPs beat brute-force token dumps.

02:05 – 04:37

02 · Installing ref.tools in Claude Code

One-command install via `claude mcp add`, API key walkthrough, and security warning about committing keys to public repos.

04:37 – 05:38

03 · Installing Exa AI in Claude Code

Same pattern as ref.tools. API key generation and paste into terminal. Both MCPs now available globally across any project.

05:38 – 12:43

04 · Live demo: Tailwind v4 refactor on Anime Leak

Haiku 4.5 runs a full codebase audit using both MCPs. Ray watches Claude Code research documentation, build a phased implementation plan, and hit /context — revealing only 2,800 tokens used.

12:43 – 15:00

05 · The token comparison: Claude Code vs Cursor

Cursor's plan mode finished the same task using 98,000 tokens. Side-by-side makes the 35x gap visceral. Ray notes Cursor asks clarifying questions; Claude Code doesn't.

15:00 – 16:50

06 · Setting up MCPs in Codex

Codex uses config.toml instead of JSON. Ray edits the file in Cursor, pastes the API key, verifies with `codex mcp list`.

16:50 – 17:50

07 · Setting up MCPs in Factory Droid

Copy the MCP JSON block from Cursor's tools panel directly into .factory/mcp.json. Same pattern, different file path.

17:50 – 18:51

08 · Pro tip + CTA

Always call MCPs explicitly in your prompt. Use plan mode (shift-tab) before writing code. Closes with a pitch for his 1337 coaching intensive and a forward-looking take on agentic AI.

§ · Storyboard

Visual structure at a glance.

open — context rot problem

hook open — context rot problem 00:00

ref.tools website

promise ref.tools website 00:57

both MCPs installed + verified

value both MCPs installed + verified 04:35

Anime Leak demo launch

value Anime Leak demo launch 05:38

/context — 2800 tokens

value /context — 2800 tokens 11:54

Codex CLI config

value Codex CLI config 13:04

Factory Droid config

value Factory Droid config 16:50

plan mode pro tip

cta plan mode pro tip 17:50

§ · Frameworks

Named ideas worth stealing.

00:15 concept

Context Rot

The degradation of LLM output quality as the context window fills with broad, untargeted documentation — the agent gets dumb before it writes a line of code.

Steal for any video explaining why vanilla web search MCPs hurt more than they help

01:31 model

Targeted Agentic Search

ref.tools — indexed docs, no context bloat
Exa AI — high-quality search built for coding tasks
Explicit MCP calls in prompt
Plan mode before code mode

Pull only the context needed for the specific sub-task, not everything that might be relevant. Combine ref.tools (documentation precision) with Exa (coding-task search quality) and always call them by name in the prompt.

Steal for CLAUDE.md rules section, any agentic coding workflow

§ · Quotables

Lines you could clip.

00:34

"I used to use up almost 100k tokens just feeding in tons of documentation."

relatable pain point, zero setup needed → TikTok hook

00:53

"LLMs actually operate best when they have just the right information for just the right specific task."

clean thesis statement, punchy standalone → IG reel cold open

12:00

"Our MCP servers and tool calls only use a total of 2,800 tokens — which is only 1.4% of the context window."

the money-shot stat, visually proven on screen → newsletter pull-quote

02:05

"It's almost like as if I hired several developers to read the documentation and implement the code for us."

strong analogy, no context needed → TikTok hook

16:50

"Make sure you use plan mode so that it gathers all that specific context before you start writing code."

actionable closer, works standalone as a tip → IG reel cold open

§ · Pacing

How they spent the runtime.

Hook length125s

Info densityhigh

Filler8%

§ · Resources Mentioned

Things they pointed at.

00:15toolref.tools MCP ↗

01:31toolExa AI MCP ↗

05:38productAnime Leak app

15:00toolCodex CLI

16:50toolFactory Droid

17:50product1337 Intensive coaching offer

18:15linkRay's LLM Rules & Prompts Repo ↗

§ · CTA Breakdown

How they asked for the click.

17:50 product

"I do have a couple more spots that are open for my one three three seven intensive."

Soft close after delivering all value. Thirty-minute sessions over five days. Credentialed with Apple engineering background. Low pressure but well-timed after a high-value tutorial.

§ 04 · The Script

Word for word.

HOOK opening / re-engagementCTA the pitch metaphor analogy

00:00HOOKSo I'm gonna show you how to get two of my favorite MCP set up. And the two are ref dot tools and the other one's called Exa AI. And the reason why I like ref dot tools compared to every other MCP that you could use to search for documentation is context rot. So that's when the LLMs start getting really dumb as soon as they try to help you out with information. And if you don't know, whenever you're using a language model, there's a cutoff date for all the data that they have trained so far. And so that's why I choose these two different MCP servers because they're the most efficient. If you've ever used any other MCP server,

00:34HOOKall of a sudden, you're gonna start slamming in a whole bunch of tokens from documentation. I used to do that before. I used to use up almost a 100 k tokens just feeding in tons of documentation for the new AI SDK, my authentication, my database provider, all these extra rules. But the LLMs actually operate the best when they have just the right information

00:55HOOKfor just the right specific task. And so in comes ref dot tools. So this actually provides the actual context that's needed for these agents. And the way that it works is it'll grab public documentation

01:08HOOKand any private documentation that you also give it as well. PDFs, GitHub repos, any other sites. It will create them and indexes and actually make them available in Cloud Code, in Droid, in Codex, in Cursor, and those are the four that we're gonna go ahead and cover today as far as trying to get this installed for us to make it work. A lot of times when you're doing tool calling, what ends up happening is that

01:30HOOKthere's gonna be multiple requests and multiple steps that happen along the way. And ref dot tools is kinda made for this agentic search so it doesn't kill your context window. So this combined with the other one called XAI, this actually has a really high quality fast search for specifically coding tasks. And if you combine these two together, you're gonna be

01:53HOOKcooking. And I have shipped features starting in plan mode using these two MCP servers to help gather the right context. And by the time it's writing the code, it's almost like as if I hired several developers to read the documentation and and implement the code for us. How do we go ahead and configure them? So in ref dot tools, they actually make it really easy for you. All you do is just go ahead and once you've already created an account,

02:15uh, you just go to MCP and in Cloud Code, all we're gonna do is install it via command line. So this API key here is what you're gonna actually put into this command. So once you hit copy, we're gonna open the terminal now. And then I'm just gonna go ahead and just paste this in here. Now it says added the MCP server with ref and has a local config headers of this. And so this means that the actual API keys here, and I think it's gonna be able to do that. So now if I run Claude

02:42and I run slash m c p, it's gonna try to connect. Now we actually have the ref dot tools actually connected for us. And we have to like quit an existing session like we did and try to log in again. And so you'll see it available once you do cloud mcps, uh, space list. So because it's in my root user, it's gonna be available to any new project I start. So don't be afraid if you add this in, you don't have to keep adding this into every project. You just have to add it in once and once you add it in here, you're pretty much good to go. By the way, if you notice, we actually have the API key currently in the MCP server.

03:14You typically don't wanna check those types of things into your project as code because that's gonna be available to everyone. So if you have the repo that's public, anyone who clones that can use your specific key and eat up all your credits as an FYI. So the next one we wanna go ahead and get configured is called exa. The command they give us is claud m c p add exa and the same with our API key. So what we're gonna go ahead and do is hit copy here. Then I'm gonna go back to cloud code in the terminal here. And what we wanna do is go ahead and place this thing called here, this is your API key.

03:45And so in this section here, what we do is just gonna go ahead and delete. And then I'm gonna grab the API key from the following API dashboard. If you've not generated a key, you can go into here and create a new key. So we can create one called, Claude code

04:01YOLOBRO. And let's go ahead and create the key. And then I'm gonna go ahead and copy this one. And then I'm gonna go into my terminal now. And then I'm just gonna go ahead and paste that key there. I'm just gonna go ahead and hit enter. It's gonna go ahead. It's now added the MCP server with the following command to the local config. And so now it's actually the file that's modified is in my local folder here. So you may notice that it could take a while for it to connect, and you can always just do Claude

04:28MCP list. And that'll basically show you what's actually

04:35currently active and available as far as MCP servers. So Haiku has just come out. It's the 4.5 model supposed to act really fast. And why don't we just try using the the latest model for a task that I'm trying to do? And in this case, we're basically just gonna ask the model to see if you can use the latest tailwind before and search through our code base. My code base isn't

04:54really fully utilizing the power of tailwind before. And I want you to take a look at my code at all the different places because I think I have a lot of stuff that I've hard coded in for tail and v three and I need it to be all unified. So we're taking advantage of the new tail and v four system.

05:12I want you to use ref and use exit MCPs first to read through the documentation to understand how this type of system works, as well as reading through all the different parts of my code to understand the current design system. And we should be using Tailwind v four along with chat c n. So right now we're basically having Claude code look through my entire app. And if you don't know what my app currently does, it's called anime leak. It allows you to upload an image, any real image of the real world. And you'll actually start to see like basically

05:45anime start leaking into the photo, which is kind of fun. So you can see me right here with my sandwich got replaced with this nice hand drawn kind of really whimsical type of thing. So in here you can kind of see all the different generations I've had, and then it lets you like share them with friends and so forth. In my original actual design, everything was in light mode. And since I forked this project from my friend Mickey,

06:08who currently works over at Convex, it was in this specific color theme. So the landing page as you can see is kinda a little bit more dark theme and it has these different types of things that are going on. And we wanna have more of like a consistency for design hierarchy and everything here. So right now, Cloud Code is actually searching through and it's gonna try to understand the code base fully. Now it's researching Tailwind Design Systems.

06:30And I don't know if it's actually fully utilizing the MCP's here. So now it's actually saying, yes, I want to use these different tools now to use the ref documentation. So the query is gonna do is Tailwind v four CSS variables design tokens customization, and yes, and don't ask again for this MCP. And the reason why I'm thinking Haiku is the right thing for this type of task is one, it's a lot cheaper. And for this type of task, I don't need a lot of intelligence. I just need a lot of work to be done. And so by handing this off to an agentic model like this, it's it's gonna be able to gather a lot of research

07:02in one side of it, but also look through the code and and look for specific patterns of things which is what I want to do. And then the utilizing of these MCP's will keep the documentation very light, but grab enough information that it needs to know to do this type of pattern matching which is gonna be really awesome here. So so while that's going, what I'm gonna go ahead and do is pull this up in cursor and kinda show you how I configured those there. We're gonna go to this little gear here and they have actually dedicated tools and MCPs.

07:26If you scroll down, these are all the MCP servers I have connected. You can just hit add custom MCP server and this is a JSON file that actually has everything configured here. And what you wanna go ahead and do is you'll go to the the website where it says use ref they're gonna give you this exact thing to paste into cursor. Add the cursor in one click. And if we go into here, it's actually gonna say, do you want to install the MCP servers?

07:49And you see ref mentioned here with the following URL and the API key. So this already has the API baked in and you hit install. Once you hit install, it's gonna actually show up just like it does here. Pretty simple. It's really nice. So we're gonna basically do the same thing with Exa. If we go back to Exa, and we go to

08:07the dashboard, and we go to Exa MCP. So the values that we wanna grab is gonna be this one here. See Exa, and then has this little bracket here and this bracket here. This is really really important because at the higher level which is MCP servers, that's your collection of all of your MCP servers. So we just get this value just like that. And you see that there's an additional thing that's added to my specific MCP server which is this thing here. It's the Exa API key equals

08:35and this is the part where you wanna grab that API key from Exa and then put it into this last section here. And so when you open your project, you just make sure to toggle on and this is green and that means you're pretty much good to go. And I'm gonna give the agent the same type of task to use the same MCP service as well. I'm just gonna copy this exact

08:53actual prompt and put this inside the cursor, so that we can kinda see them both kinda cook right here. So so now, what we can do as well is we can put the agent into plan mode. So what happens in plan mode is no code gets written, but then we'll have all this analysis done and we'll have MCP's be able to be utilized inside of cursor. And so cursor is gonna do all the work for us, which is really nice. If I do command e, this is gonna pop up a dedicated

09:18specific window for our agents, so that we can do this type of planning. Now, we get a straight up dedicated window to watch the agent really work. And to me, this is like a really nice way to work as you can kinda see it's a little more spread out. And as we are gonna start modifying files or working with anything in the code base, we have what I feel like is a much more elegant view,

09:39all self contained in in one type of thing. So to get to this agent window, you have to hit command e. You And may have to enable it in the settings as well. So as you can kinda see it's narrowing down the scope as it discovers information to make sure that everything is the latest information is up to date. And so this is a lot more efficient than web search after web search after web search. Sometimes they'll run a parallel tool call to do a lot of web search and then you'll start to flood

10:03the context window and a lot of times these, you know, agents will have to start pruning information. And sometimes you can have really relevant information pruned out. And that's why I prefer these two MCP's. One of my favorite things about cursor is how fast it is at gathering contacts and information throughout my file system. Because if we go back to Cloud Code right now, while it's still running in the terminal, it is still running all these different tasks. We're still on the research side and you can see all the different tool calls it's been doing to get all the relevant information that it needs for this task.

10:35And it takes a little while, right? So it's now auditing current code base for our code values, create a comprehensive design token system and global dot CSS. But this is actually kind of already giving us a plan here. So I have a comprehensive understand. Let me update the to do list and create a detailed implementation plan. So it actually created a plan for us without me actually asking for it, which is really nice. Right? So this is my current state. My global CSS is already using Tailwind v four correctly. However, there are opportunities to expand and unify the system. But as you vibe code stuff, the agents are gonna prefer what's in their training set, so they're gonna dump in a bunch of v three

11:07type of systems, but here's the gaps to address. So there's like missing semantic tokens, means hard coded values are scattered, magic numbers and components, and all this kind of stuff that we probably don't want. So there's a couple phases that it wants to do here. So it's like a unified design system starts here. Uh, phase two to extract the component tokens,

11:26HOOKand then three is to fix the hard code values. And then four is to create a tailwind config for advanced customization, which is gonna be really nice. So now, it's asking me, would you like me to proceed with implementing this plan? I recommend we start with phase one. I should be able to look at slash context. And we're gonna get a visualization right now of how our context is being used. So as of now, with these tool calls, can kinda see

11:50HOOKit only used like less than a thousand tokens for all these different tool calls, which is amazing. So our MCP servers and tool calls only use a total of 2,800 tokens, which is only 1.4% of the context window. And you can see the value in just these two mcps alone on a refactoring task which requires touching a lot of files, reading them in, and, you know, doing this type of comparison work. I'm super impressed with this type of system and I found extremely good consistency combining the two. And let's just go ahead and check and see what's cursor is doing. I mean, cursor probably finished way earlier and you can see the token efficiency in cursor. He used 98,000 tokens. So one of the cool things about cursor's plan mode, which is what we didn't really compare here in Cloud Code, is the fact that it's now asking me a couple more questions. Like, should we handle semantic scope of brand colors and then the cream background? But the last thing we wanna go ahead and get set up here is getting this set up

12:43probably in codecs and also getting this set up in a fact, a droid factory. So for codecs, we're just gonna go ahead and go to what the instructions say for codecs. So in use ref here, they have like any MCP's clients. You can say find your NCP client for use ref, and we can say codecs here, and they'll say codecs CLI. So in codecs CLI, they give us the actual documentation reference here. And so for here, it says codecs is configured in a config dot toml. So if I go to terminal,

13:11and I just open up any tab, codecs m c p list. And we can see I currently have playwright installed. So I don't currently have the x o one installed. What I can do is I usually just copy this here

13:26and then go back to my terminal.

13:30You can even do this in cursor. It's probably just easier to just open up the terminal in cursor. So just in here, just type in cursor, cursor and then hit this. And this is actually gonna take us exactly to where this specific file is in our file system. So this one says to enable ref, update your TOML to include this. We're just gonna copy this. We're gonna go back in the cursor. We're gonna paste.

13:52And we still need to put our API key here from the use ref. So since we already had it configured in cursor, what I'm gonna go ahead and do is just scroll down to where it says tools MCP here on the side. And then I'm gonna go to ref and hit this little pencil. And so the pencil will already have ref listed with our API key. I'm gonna copy this API key and then in here in versus config dot toml where we're at, I'm just gonna go ahead and

14:14just delete this here and then paste it in here. I'm gonna go ahead and hit save and then now we should have it currently saved. So if I type in codecs MCP list, we should now be able to see that the MCP server is currently configured in here. So use

14:30ref MCP to look up the latest on tailwind v four and see if there are any gaps in my implementation.

14:41Okay. So we're just gonna hit enter. And so now it's gonna go ahead and send that task off and kinda do some research using the OpenAI Codex version. It's actually you can see here ref dot ref research documentation. This is actually now using the MCP that's built in. And so we're gonna do the same thing for Exa, and then we're gonna go back into here where it says Exa MCP.

15:01And then in here, we're gonna just type in like codecs. I'm gonna copy this URL since this already works inside of cursor. We're just gonna do the same thing and make it work inside of the config dot toml or codecs. And what I'm gonna go ahead and do is just paste this key right into here. So this is exactly the same key we had inside of cursor. And you can kinda see with that little question mark, that's a little parameter that we're gonna be passing in that shows the x API key. I'm gonna open up a new terminal window and say codecs MCP list.

15:27And now we should be able to see the exa MCP currently listed and status is enabled. If I paste this in, and then I should be able to get some more prompts or something back saying, hey, want to use this MCP and I just should be able to give us some approval. So for the factory droid, if you're using droid as an agent, what you wanna go ahead and do is kinda do something similar. I'm gonna show you real quick. So it's called cursor and then tld

15:54slash dot factory, and it's gonna be under slash m c p json. So when I do that, what what's gonna happen now is now we have this specific thing opened up inside of cursor. This allows us to actually edit the file and save it all in one place, which is really really nice. So all you have to do is get your configuration from cursor. And if you were getting it from cursor, all you do is go in here, hit tools MCP,

16:18go into this section, and all you have to do is literally copy this from this blue ref to here to this including this little blue line in that. And then you go back over here where it says MCP factory JSON right up there. And then you just paste it right in there and you hit save. Once you have these two MCP servers, I think what's gonna happen for your AI coding is that it's significantly gonna get better. But you actually have to call them specifically saying use ref MCP

16:45and then use exa MCP or use exa code for MCP. And so the other pro tip that I wanna give you before we kinda close out the video is the fact that you want to make sure that you use them in plan mode. So in any type of planning mode, you can usually hit shift tab once or twice. You'll see that all of these agents have like a spec mode, a planning mode of some sort. Make sure you use those specific keywords so that it gathers up all that specific context you want before you start writing code. This is at first getting started. If you want to learn a little bit more, I do have a couple more spots that are open for my one three three seven intensive. And so that's the elite if you don't understand what that means.

17:25CTAAnd I can sit down with you for thirty minutes for five days in a row, and we're gonna try to get either MCP server set up, your documentation workflow set up, maybe something that's kinda stopping you from getting the results that you want. And I've been doing this type of stuff for several years with AI coding, but as far as engineering, I used to work at Apple as an engineer and I've been doing it for over a decade. And I have a lot of industry experience shipping software,

17:50CTAhaving software that affects billions of people every single day. And if you're interested in that, feel free to check out the link in the description obviously. And I wanna leave you with this last thing. This AI coding thing is just starting to take off and we've thrown the first pitch of many hundreds of pitches to go in this really long baseball game of AI.

18:10CTAAnd I'm really excited for this era of AI coding because this is now unlocking a lot more capability. And the models of what we're doing today are just getting off the ground. And these type of tooling like the XMCP and the useRefMCP, these are gonna be really essential to keep the language model training data, like the intelligence of the models here, and the training data that is currently missing, it's just gonna pull that right in and try to use this type of logic. These tools are really magical now because now they're available twenty four seven and they don't require bugging a senior senior engineer every single day. My name is Ray Fernando, and I'm really excited to be with you. And I'll see you in the next livestream. Peace out, y'all.

— full transcript

§ 05 · For Joe

Stop dumping docs. Fetch what the agent needs.

Context engineering playbook

The 35x token gap isn't about Claude Code being smarter than Cursor — it's about giving the agent a scalpel instead of a firehose.

Add ref.tools and Exa to your global Claude Code config once — they're available to every project automatically.
Call them explicitly in your prompt: 'use ref MCP' and 'use exa MCP' — the agent won't reach for them unprompted.
Always use plan mode (shift-tab) before letting the agent write code — gather context first, execute second.
The /context command in Claude Code shows token usage per tool call — use it to audit your own sessions.
Haiku 4.5 is the right model for large-codebase pattern-matching tasks: cheaper, fast, doesn't need intelligence — needs volume.
Don't commit API keys to public repos — the MCP config files store them in plaintext.

§ 05 · For You

What this means if you're using AI coding tools.

If you vibe-code or use Claude/Cursor daily

Your AI assistant is probably burning through its memory reading docs it doesn't need — and getting dumber because of it.

Install ref.tools and Exa as MCP servers once — they let your AI fetch only the exact documentation it needs for each step.
Always start with plan mode before asking the AI to write code — it thinks before it acts instead of hallucinating from stale training data.
Tell the AI explicitly which tools to use: 'use ref MCP to look up the Tailwind v4 docs' — it won't do it on its own.
Check your context usage with /context in Claude Code — if MCP tool calls are eating thousands of tokens, something's misconfigured.
The 35x efficiency gap Ray shows isn't magic — it's just not flooding the AI's working memory with irrelevant information.

§ 06 · Frame Gallery

Visual moments.

00:43

02:14

04:07

08:05

09:04

15:40