WEBVTT

00:00:00.000 --> 00:00:36.010
There's a huge race in AI right now to define the next generation of RAG. While you've probably heard claims that RAG is dead way too many times by now, those are often overhyped and solutions are generally too simplistic. Many players in the industry are converging on the idea of a knowledge or context layer between your agent and its underlying data sources as a much better alternative to conventional RAG. But there's not a lot of agreement about what those solutions actually look like and they're generally not one size fits all. And now Redis, one of the real pioneers in high speed data retrieval has just announced Iris with an architecture that's really worth taking seriously

00:00:36.010 --> 00:00:49.025
if you're building AI agents that rely on large scale fast changing data spread across many different data sources because these are serious weaknesses of most AI retrieval systems. And by the way, we have no affiliation with Redis whatsoever.

00:00:49.105 --> 00:01:42.765
We've in fact never done a sponsored video on this channel, but we can all take inspiration from Redis' architecture here without even having to use their stack directly. Let's start with some context. Back a few months ago, the CEO of Redis was quoted saying, I've seen fewer examples of real successful production agents than I would have imagined in terms of anything outside of engineering. And to be fair, AI agents have often under delivered in production systems across the board, and there's a huge gap between a flashy demo and one that survives real world data and real use cases. He also wrote in this blog post, the hardest problems in production AI are no longer solved by model choice. They show up at runtime, stale state, slow retrieval, fragmented memory, disconnected tools, and sessions that fail to compound. And the example he uses here is a customer support bot. A customer might ask, why is my order late? Think about everything the agent needs to actually answer that question, especially in bigger organizations,

00:01:43.005 --> 00:01:56.030
such as the customer database, the order system, the shipping provider, the ticketing tool, and the policy docs. Naive rag is almost never going to work in these kind of use cases, and it can still be quite a challenge for a lot of agentic rag configurations.

00:01:56.030 --> 00:02:22.510
We've covered similar solutions to this on our channel before, such as giving your AI agent locked down access to a read only view on your database. And we go very deep into agentic retrieval strategies within our AI architects course linked in the description. But let's look through exactly what Redis are offering here. Before we dig into it here, they have a good list here of requirements for agents to function at scale. Of course, everything mentioned here is a focus of their product offering, but it is still a good list to give some context upfront.

00:02:22.590 --> 00:02:28.270
First off, the agent should be able to navigate throughout a large amount of data. It should be able to traverse relationships,

00:02:28.270 --> 00:02:32.445
understand entities, discover relevant context and so on. Secondly,

00:02:32.445 --> 00:02:34.765
context should be retrievable quickly,

00:02:34.765 --> 00:02:37.565
which in most cases is a very true requirement.

00:02:37.725 --> 00:03:00.040
Agent to grad systems can often be very slow if they have quality retrieval strategies behind the hood and they may work through many loops to retrieve the correct information. Third, context that is always up to date. Agent retrieval pipelines are often too slow for anything near real time, and the day that your agent retrieved ten minutes ago might already be stale and out of date. And the fourth is the self improvement aspect

00:03:00.235 --> 00:03:03.595
that most AI agents don't really remember interactions,

00:03:03.835 --> 00:03:33.095
information, and context as they should. So what exactly is Redis Iris and how are they looking to solve those challenges and meet those requirements? First off, Iris is a stack of Redis services and not all of them are new. Iris has just been announced at the time of recording this video, so this is definitely not a hands on tutorial or review of their service, but rather an explainer of the retrieval architecture they're using, which is quite different to many other AI context layer solutions in the industry. At a very high level, you have the data in the source systems,

00:03:33.335 --> 00:03:35.575
Oracle, Postgres, MongoDB,

00:03:35.655 --> 00:03:37.975
and you have this Redis data integration

00:03:37.975 --> 00:03:39.735
that continually captures,

00:03:39.735 --> 00:04:04.635
changes, and syncs them into Redis data structures. So now you've got an operational copy of the data within Redis. Your agent can then interact with this data using the Redis context retriever, which makes a CLI and MCP tools available to the agent. We'll talk about that in a minute. Redis agent memory then tries to persist what it learns across sessions using a combination of both short term and long term memory. And then Redis LAN cache caches responses

00:04:04.635 --> 00:04:15.920
and then tries to short circuit anything that's been answered before. So your agent never actually touches the operational data directly. It interacts with the data in the Redis DB that's been synced by the Redis data integration

00:04:16.080 --> 00:04:37.175
via MCP or CLI that's made available by the Redis context retriever. So let's dig into those components because you probably have more questions than answers at this point. Let's start from the start with this Redis data integration, which is currently in public preview. RDI implements a change data capture pattern to sync data from a source database such as Postgres or Oracle, Snowflake, or Mongo,

00:04:37.610 --> 00:05:00.085
and tracks that and updates the data into the Redis data structures. This is how Iris is covering the requirements that we mentioned earlier of being always up to date. RDI mirrors a fresh copy of your data for the agents to hit at high speed, and Redis are the experts at lightning fast retrieval, so I wouldn't doubt them that much in that regard. And since they're making a copy of the data from the operational systems

00:05:00.165 --> 00:05:19.420
means that the agent is not going to bombard the transactional systems with requests because hitting operational data directly could be quite an issue for busy agentic systems where agents could be making thousands of times more requests than a human would. It also means that the data can be modeled in a manner that's more efficient from both the speed and indexing perspective

00:05:19.815 --> 00:05:22.455
and also in a more flattened denormalized

00:05:22.455 --> 00:05:46.710
structure that will make it a lot easier for your agents to interact with via tools. Of course, this idea of copying operational data to a different source is not exactly new. That approach is often used for analytics and cash in, for example. Next, the Redis context retriever is the one that aims to deliver on the requirement for the agents to be able to navigate through your knowledge base. The idea here is that you define models of your business data, the entities, the fields, the relationships,

00:05:47.215 --> 00:05:51.215
and these can then be executed via MCP or CLI via your agent.

00:05:51.375 --> 00:06:14.645
You can then define the data you want to give your agent access to along with role level access control. So your entities could be product or customer or order, for example, and then you have tools. So these are the tools that your agent will then be able to call based on that data. For example, find product by range, get customer by ID, search customers by text, filter by tags, filter product in stock. So you have a bunch of different operations such as filtering,

00:06:14.805 --> 00:06:35.640
finding, getting, searching. So it's giving the agent tools to more easily access your data as it needs without trying to get your agents to join data across lots of different tables or across different data sources, which can be incredibly unreliable in agentic rag. The Redis agent memory includes both short term and long term memory features. For short term memory, you can set a custom TTL,

00:06:35.720 --> 00:06:50.855
which will be very important for systems where the source data might change very, very frequently. And then they have long term memory, which stores extracted past sessions, user preferences, learned patterns, and other relevant data. And this is one of many memory solutions across the industry. For example, you have MEM0,

00:06:50.935 --> 00:06:52.840
Honsho, DEP, Graffiti.

00:06:52.840 --> 00:07:10.455
We have dreaming style features within clawed managed agents and OpenClaw, which I went through in a previous video, and much more. But for how the Redis memory works, first off, we have the short term memory, which is a session memory, and that's very important to maintain the current conversation state and session history. So the short term memory will be stored temporarily,

00:07:10.775 --> 00:07:13.255
and certain elements, user preferences,

00:07:13.415 --> 00:07:25.890
patterns, and other relevant data may be promoted to the long term memory. Otherwise, it's just deleted as per the TTL policy. Then you have the LAN cache service. So instead of calling your LLM for every single request,

00:07:26.130 --> 00:07:30.290
you can use LAN cache to check if a similar response has already been made previously,

00:07:30.755 --> 00:07:40.035
and if so, returns it instantly from the cache to save time and money. It sounds great. Semantic caching could be very useful for your projects, but it's also a potential minefield

00:07:40.035 --> 00:07:46.680
where you can get similar past responses that are actually out of context. When searching the cache, you can search by similarity thresholds

00:07:46.760 --> 00:07:48.600
and also search strategies

00:07:48.680 --> 00:08:15.890
either using exact search or semantic search. These can be pretty blunt instruments, so you really need to thoroughly evaluate systems that are using response caching like this. The data is queried within the system using Redis search, and that can search vector, structured, and unstructured data all within one index. Here, you can see some Redis search queries similar to SQL, but with its own syntax. There are lots of different types of queries from exact match, range, full text, geospatial,

00:08:15.890 --> 00:08:16.610
vector,

00:08:16.770 --> 00:08:18.290
combined, and aggregation.

00:08:18.450 --> 00:08:38.055
Redis also claimed that you can easily scale to 1,000,000,000 vectors using their indexes, which is pretty huge. And then they also have Redis Flex, which is a new SSD based storage tier that they're offering. So you're not paying for every single thing to run-in memory, which could really make a difference in terms of pricing. So here, Redis are very much acknowledging that you cannot just magically solve retrieval

00:08:38.300 --> 00:09:10.010
with a very simple layer. It requires a modular stack. You need multiple services together and you need to be very cognizant of how you're using them. And it's very important to look past the market in here. Retrieval here is certainly not just a solved problem by signing up for a Redis account. This is not plug and play, and it will require maintenance to make sure that the shape of your retrieval layer is up to date with the source operational data, and you need to model your source data along with relationships. Recently on our channel, Daniel covered Pinecone Nexus, which is another knowledge layer approach to your AI agents,

00:09:10.090 --> 00:09:11.930
but it's quite a different architecture.

00:09:12.170 --> 00:09:16.810
Pinecone's new product offering here goes to build time. It precompiles

00:09:16.810 --> 00:09:43.840
types knowledge artifacts like related to sales, finance, support, marketing, so the agent queries a preshaved answer instead of rederiving it at every single call. Whereas Redis here goes to runtime. It doesn't try to precompute anything into a compiled knowledge layer. It makes the data structures fast and navigable so the agent pulls fresh context on demand as it's quickly changing. And the two options here split pretty cleanly on where they're strong. Pinecones

00:09:43.840 --> 00:09:49.360
is more likely to be strong where you have a large stable knowledge base with recurring known questions,

00:09:49.680 --> 00:09:55.015
contracts, compliance, manuals, and things like that where a precompiled artifact

00:09:55.095 --> 00:10:11.280
is exactly right. When using Pinecone's knowledge engine or anything like Andrea Kapathi's wiki idea, anytime the source data changes, you need to recompile what's in the knowledge layer, whereas Redis' architecture here will be far more suitable to very fast changing data environments.

00:10:11.280 --> 00:10:39.970
Because in those cases, a precompiled artifact in a knowledge layer could be stale five minutes after it's created. When we're evaluating data retrieval solutions, we often have a use case in our mind from past experience or specific projects we've worked on, and there's certainly a segment of software professionals that are rolling their eyes when they see the Carpathi Wiki idea or Pinecone Nexus idea of a compiled knowledge layer. They're likely to be thinking of use cases where the underlying data is changing very regularly.

00:10:40.050 --> 00:10:43.090
And even though this kind of architecture can be pretty complex,

00:10:43.330 --> 00:11:34.645
it may be what's required to make a reliable working AI agent in production. When you take a step back, there really is no one size fits all solution to retrieval. Conventional rag and simpler agentic rag solutions were often touted as the magic solution. But in reality, flashy demos often don't translate to reliable production systems. Digging deep into retrieval strategies has been our main focus on this channel and community for quite a long time. And if you want to design a context and retrieval layer that will actually work in production, that's exactly what we cover in our agentic retrieval module inside the AI architects course in our community. Link in the description below. And I'd also highly recommend you check out our recent video covering pinecone nexus, which Daniel went through on our channel, which uses quite a different architecture for their knowledge layer than the Redis Iris architecture covered in this video. Thanks for watching.
