The AI Automators · Youtube · 11:35

RAG Just Got Inverted. Here's The Stack That Replaces It.

How Redis Iris flips the retrieval model from pre-compiled artifacts to a live operational mirror — and why that split decides which architecture your agent actually needs.

Posted

May 25th 2026

yesterday

Duration

11:35

Format

Tutorial

educational

Channel

TA

The AI Automators

§ 01 · The Hook

The bait, then the rug-pull.

Every few months someone declares RAG is dead. The claims are usually overhyped and the replacements too simplistic. But Redis just announced something worth actually reading — an architecture called Iris that treats the retrieval problem not as a search problem but as a data infrastructure problem, and that distinction matters more than it sounds.

§ · Chapters

Where the time goes.

00:00 – 02:12

01 · Overview — why RAG fails in production

Sets up the core problem with a Redis CEO quote: stale state, fragmented memory, slow retrieval. Customer support bot example illustrates collapse when data spans 5+ systems.

02:12 – 03:06

02 · Four requirements for agent retrieval at scale

Navigable, fast, always up to date, self-improving. Redis criteria for what a production retrieval layer must deliver.

03:06 – 04:25

03 · What is Redis Iris

High-level stack: RDI syncs sources into Redis, Context Retriever exposes MCP/CLI tools to agents, Agent Memory persists sessions, LangCache deduplicates responses.

04:25 – 05:37

04 · Redis Data Integration (RDI)

Change data capture from Postgres, Oracle, Snowflake, MongoDB. Protects transactional systems from agentic request volume. Enables denormalized, agent-friendly modeling.

05:37 – 06:28

05 · Redis Context Retriever

Define entities, fields, relationships, and role-level access control. Redis auto-generates typed tools exposed to agents via MCP or CLI.

06:28 – 07:18

06 · Redis Agent Memory

Short-term session memory with configurable TTL plus long-term memory for promoted preferences and learned patterns.

07:18 – 08:01

07 · LangCache — semantic response caching

Caches LLM responses; short-circuits repeated similar queries. Warning: similarity thresholds are blunt and stale cache hits can be contextually wrong.

08:01 – 08:50

08 · Redis Search and Redis Flex

Single unified index across vector, structured, and unstructured data. Redis Flex is a new SSD tier for billion-vector-scale cost reduction.

08:50 – 09:04

09 · Not plug and play

Honest caveat: modular stack requiring data modeling and ongoing maintenance. Signing up does not solve retrieval.

09:04 – 10:54

10 · Comparison vs Pinecone Nexus

Build-time (Pinecone) pre-compiles knowledge artifacts for stable recurring-question domains. Runtime (Redis) pulls fresh context on demand for fast-changing operational data.

10:54 – 11:35

11 · No one-size-fits-all retrieval

Closing argument: match architecture to data volatility. Course CTA and cross-link to Pinecone Nexus video.

§ · Storyboard

Visual structure at a glance.

open

hook open 00:00

context — CEO quote

promise context — CEO quote 00:36

why context engine

value why context engine 01:36

4 requirements

value 4 requirements 02:16

Iris high-level

value Iris high-level 03:03

RDI diagram

value RDI diagram 04:25

context retriever tools

value context retriever tools 05:37

value Redis Search 08:01

Pinecone compare

value Pinecone compare 09:04

agentic RAG landscape

value agentic RAG landscape 10:54

course CTA

cta course CTA 11:14

Pinecone cross-link

cta Pinecone cross-link 11:22

§ · Frameworks

Named ideas worth stealing.

02:12 list

Four Requirements for Agent Retrieval at Scale

Navigable
Fast
Always up to date
Self-improving

Redis framing of what a production-grade retrieval layer needs to deliver. Each requirement maps to a component in the Iris stack.

Steal for Use as an evaluation checklist when auditing any RAG or agentic retrieval system you're building or buying

09:09 model

Runtime vs Build-time Retrieval Split

Runtime (Redis Iris) — fresh data, live mirror, fast-changing environments
Build-time (Pinecone Nexus) — pre-compiled artifacts, stable knowledge, recurring queries

The central architectural decision frame. Picks the right pattern based on how frequently underlying data changes.

Steal for Use as a decision tree when choosing a knowledge layer architecture for a new agent system

§ · Quotables

Lines you could clip.

01:18

"The hardest problems in production AI are no longer solved by model choice. They show up at runtime, stale state, slow retrieval, fragmented memory, disconnected tools, and sessions that fail to compound."

Quotable CEO framing that captures the video's entire thesis in one sentence → TikTok hook

01:50

"Naive RAG is almost never going to work in these kind of use cases."

Confident, provocative, no hedging → IG reel cold open

08:55

"This is not plug and play, and it will require maintenance."

Rare honest disclaimer in a product review → newsletter pull-quote

10:05

"A precompiled artifact in a knowledge layer could be stale five minutes after it's created."

Concrete and punchy; illustrates the runtime vs build-time tradeoff better than any diagram → TikTok hook

§ · Resources Mentioned

Things they pointed at.

00:00productRedis Iris ↗

09:04channelPinecone Nexus video ↗

01:56channelSelf-service agent video ↗

02:10productAI Architects course ↗

06:50toolMEM0

06:51toolLetta

§ · CTA Breakdown

How they asked for the click.

11:14 product

"if you want to design a context and retrieval layer that will actually work in production, that's exactly what we cover in our agentic retrieval module inside the AI architects course in our community. Link in the description below."

Low-pressure, value-first setup. The course CTA comes after 10+ minutes of genuinely useful architecture content, making it credible. Cross-linked related video provides a natural next step.

§ 04 · The Script

Word for word.

HOOK opening / re-engagementCTA the pitch metaphor analogy story

00:00HOOKThere's a huge race in AI right now to define the next generation of RAG. While you've probably heard claims that RAG is dead way too many times by now, those are often overhyped and solutions are generally too simplistic. Many players in the industry are converging on the idea of a knowledge or context layer between your agent and its underlying data sources as a much better alternative to conventional RAG.

00:21HOOKBut there's not a lot of agreement about what those solutions actually look like and they're generally not one size fits all. And now Redis, one of the real pioneers in high speed data retrieval has just announced Iris with an architecture that's really worth taking seriously if you're building AI agents that rely on large scale fast changing data spread across many different data sources because these are serious weaknesses of most AI retrieval systems.

00:45And by the way, we have no affiliation with Redis whatsoever. We've in fact never done a sponsored video on this channel, but we can all take inspiration from Redis' architecture here without even having to use their stack directly. Let's start with some context.

00:58Back a few months ago, the CEO of Redis was quoted saying, I've seen fewer examples of real successful production agents than I would have imagined in terms of anything outside of engineering. And to be fair, AI agents have often under delivered in production systems across the board, and there's a huge gap between a flashy demo and one that survives real world data and real use cases.

01:18He also wrote in this blog post, the hardest problems in production AI are no longer solved by model choice. They show up at runtime, stale state, slow retrieval, fragmented memory, disconnected tools, and sessions that fail to compound. And the example he uses here is a customer support bot.

01:34A customer might ask, why is my order late? Think about everything the agent needs to actually answer that question, especially in bigger organizations, such as the customer database, the order system, the shipping provider, the ticketing tool, and the policy docs.

01:48Naive rag is almost never going to work in these kind of use cases, and it can still be quite a challenge for a lot of agentic rag configurations. We've covered similar solutions to this on our channel before, such as giving your AI agent locked down access to a read only view on your database. And we go very deep into agentic retrieval strategies within our AI architects course linked in the description.

02:08But let's look through exactly what Redis are offering here. Before we dig into it here, they have a good list here of requirements for agents to function at scale. Of course, everything mentioned here is a focus of their product offering, but it is still a good list to give some context upfront.

02:22First off, the agent should be able to navigate throughout a large amount of data. It should be able to traverse relationships, understand entities, discover relevant context and so on.

02:31Secondly, context should be retrievable quickly, which in most cases is a very true requirement.

02:37Agent to grad systems can often be very slow if they have quality retrieval strategies behind the hood and they may work through many loops to retrieve the correct information. Third, context that is always up to date. Agent retrieval pipelines are often too slow for anything near real time, and the day that your agent retrieved ten minutes ago might already be stale and out of date.

02:57And the fourth is the self improvement aspect that most AI agents don't really remember interactions, information, and context as they should.

03:05So what exactly is Redis Iris and how are they looking to solve those challenges and meet those requirements? First off, Iris is a stack of Redis services and not all of them are new. Iris has just been announced at the time of recording this video, so this is definitely not a hands on tutorial or review of their service, but rather an explainer of the retrieval architecture they're using, which is quite different to many other AI context layer solutions in the industry.

03:30At a very high level, you have the data in the source systems, Oracle, Postgres, MongoDB, and you have this Redis data integration

03:37that continually captures, changes, and syncs them into Redis data structures. So now you've got an operational copy of the data within Redis.

03:47Your agent can then interact with this data using the Redis context retriever, which makes a CLI and MCP tools available to the agent. We'll talk about that in a minute. Redis agent memory then tries to persist what it learns across sessions using a combination of both short term and long term memory.

04:02And then Redis LAN cache caches responses and then tries to short circuit anything that's been answered before. So your agent never actually touches the operational data directly.

04:10It interacts with the data in the Redis DB that's been synced by the Redis data integration via MCP or CLI that's made available by the Redis context retriever. So let's dig into those components because you probably have more questions than answers at this point.

04:24Let's start from the start with this Redis data integration, which is currently in public preview. RDI implements a change data capture pattern to sync data from a source database such as Postgres or Oracle, Snowflake, or Mongo, and tracks that and updates the data into the Redis data structures.

04:41This is how Iris is covering the requirements that we mentioned earlier of being always up to date. RDI mirrors a fresh copy of your data for the agents to hit at high speed, and Redis are the experts at lightning fast retrieval, so I wouldn't doubt them that much in that regard. And since they're making a copy of the data from the operational systems

05:00means that the agent is not going to bombard the transactional systems with requests because hitting operational data directly could be quite an issue for busy agentic systems where agents could be making thousands of times more requests than a human would. It also means that the data can be modeled in a manner that's more efficient from both the speed and indexing perspective

05:19and also in a more flattened denormalized structure that will make it a lot easier for your agents to interact with via tools. Of course, this idea of copying operational data to a different source is not exactly new.

05:31That approach is often used for analytics and cash in, for example. Next, the Redis context retriever is the one that aims to deliver on the requirement for the agents to be able to navigate through your knowledge base. The idea here is that you define models of your business data, the entities, the fields, the relationships,

05:47and these can then be executed via MCP or CLI via your agent. You can then define the data you want to give your agent access to along with role level access control. So your entities could be product or customer or order, for example, and then you have tools.

06:00So these are the tools that your agent will then be able to call based on that data. For example, find product by range, get customer by ID, search customers by text, filter by tags, filter product in stock. So you have a bunch of different operations such as filtering,

06:14finding, getting, searching. So it's giving the agent tools to more easily access your data as it needs without trying to get your agents to join data across lots of different tables or across different data sources, which can be incredibly unreliable in agentic rag. The Redis agent memory includes both short term and long term memory features.

06:33For short term memory, you can set a custom TTL, which will be very important for systems where the source data might change very, very frequently. And then they have long term memory, which stores extracted past sessions, user preferences, learned patterns, and other relevant data.

06:46And this is one of many memory solutions across the industry. For example, you have MEM0, Honsho, DEP, Graffiti.

06:52We have dreaming style features within clawed managed agents and OpenClaw, which I went through in a previous video, and much more. But for how the Redis memory works, first off, we have the short term memory, which is a session memory, and that's very important to maintain the current conversation state and session history.

07:08So the short term memory will be stored temporarily, and certain elements, user preferences, patterns, and other relevant data may be promoted to the long term memory.

07:17Otherwise, it's just deleted as per the TTL policy. Then you have the LAN cache service. So instead of calling your LLM for every single request,

07:26you can use LAN cache to check if a similar response has already been made previously, and if so, returns it instantly from the cache to save time and money. It sounds great.

07:35Semantic caching could be very useful for your projects, but it's also a potential minefield where you can get similar past responses that are actually out of context. When searching the cache, you can search by similarity thresholds

07:46and also search strategies either using exact search or semantic search. These can be pretty blunt instruments, so you really need to thoroughly evaluate systems that are using response caching like this.

07:57The data is queried within the system using Redis search, and that can search vector, structured, and unstructured data all within one index. Here, you can see some Redis search queries similar to SQL, but with its own syntax. There are lots of different types of queries from exact match, range, full text, geospatial,

08:15vector, combined, and aggregation. Redis also claimed that you can easily scale to 1,000,000,000 vectors using their indexes, which is pretty huge.

08:23And then they also have Redis Flex, which is a new SSD based storage tier that they're offering. So you're not paying for every single thing to run-in memory, which could really make a difference in terms of pricing. So here, Redis are very much acknowledging that you cannot just magically solve retrieval

08:38with a very simple layer. It requires a modular stack. You need multiple services together and you need to be very cognizant of how you're using them.

08:45CTAAnd it's very important to look past the market in here. Retrieval here is certainly not just a solved problem by signing up for a Redis account. This is not plug and play, and it will require maintenance to make sure that the shape of your retrieval layer is up to date with the source operational data, and you need to model your source data along with relationships.

09:03CTARecently on our channel, Daniel covered Pinecone Nexus, which is another knowledge layer approach to your AI agents, but it's quite a different architecture. Pinecone's new product offering here goes to build time.

09:15It precompiles types knowledge artifacts like related to sales, finance, support, marketing, so the agent queries a preshaved answer instead of rederiving it at every single call. Whereas Redis here goes to runtime.

09:28It doesn't try to precompute anything into a compiled knowledge layer. It makes the data structures fast and navigable so the agent pulls fresh context on demand as it's quickly changing. And the two options here split pretty cleanly on where they're strong.

09:42Pinecones is more likely to be strong where you have a large stable knowledge base with recurring known questions, contracts, compliance, manuals, and things like that where a precompiled artifact

09:55is exactly right. When using Pinecone's knowledge engine or anything like Andrea Kapathi's wiki idea, anytime the source data changes, you need to recompile what's in the knowledge layer, whereas Redis' architecture here will be far more suitable to very fast changing data environments. Because in those cases, a precompiled artifact in a knowledge layer could be stale five minutes after it's created.

10:17When we're evaluating data retrieval solutions, we often have a use case in our mind from past experience or specific projects we've worked on, and there's certainly a segment of software professionals that are rolling their eyes when they see the Carpathi Wiki idea or Pinecone Nexus idea of a compiled knowledge layer. They're likely to be thinking of use cases where the underlying data is changing very regularly.

10:40And even though this kind of architecture can be pretty complex, it may be what's required to make a reliable working AI agent in production. When you take a step back, there really is no one size fits all solution to retrieval.

10:53CTAConventional rag and simpler agentic rag solutions were often touted as the magic solution. But in reality, flashy demos often don't translate to reliable production systems. Digging deep into retrieval strategies has been our main focus on this channel and community for quite a long time.

11:10CTAAnd if you want to design a context and retrieval layer that will actually work in production, that's exactly what we cover in our agentic retrieval module inside the AI architects course in our community. Link in the description below. And I'd also highly recommend you check out our recent video covering pinecone nexus, which Daniel went through on our channel, which uses quite a different architecture for their knowledge layer than the Redis Iris architecture covered in this video.

11:33CTAThanks for watching.

— full transcript

§ 05 · For Joe

Retrieval architecture, not model choice, determines production success.

WHAT TO LEARN

When AI agents fail in production the culprit is almost never the model — it is stale data, fragmented sources, and a retrieval layer designed for demos rather than real operational complexity.

The failure modes of production AI agents — stale context, slow retrieval, fragmented memory, sessions that reset — are infrastructure problems, not model problems.
Agents querying operational databases directly can generate thousands of times more requests than humans; a dedicated mirror layer is essential to prevent collateral damage to transactional systems.
Change data capture solves the freshness problem more reliably than scheduled batch exports or cron-based syncs, because it tracks row-level changes continuously.
Typed entity tools (find by ID, search by text, filter by tag) give agents a far more reliable interface to structured data than open-ended SQL generation or cross-table joins.
Semantic response caching can reduce LLM costs but is a potential failure point — a cached answer to a similar query can be contextually wrong in a new session; test similarity thresholds rigorously.
The choice between runtime retrieval (live mirror, on-demand) and build-time retrieval (pre-compiled knowledge artifacts) comes down to one question: how frequently does the underlying data change?
Pre-compiled knowledge artifacts are right for stable, recurring-question domains like compliance, manuals, and contracts; they become liabilities in fast-changing environments where they can be stale within minutes.
Short-term session memory with configurable TTL and promotion logic to long-term memory separates a useful production agent from one that re-derives context from scratch on every call.

§ 06 · Frame Gallery

Visual moments.

00:12

04:41

09:16

09:59

11:04