March 16, 20268 min read

What is RAG in 2026?

I don't think RAG is dead so much as it is being refined into hybrid retrieval systems shaped by long-context models, coding agents, and the realities of large datasets.

AIRAGAgentsArchitecture

The claim I keep seeing

I keep seeing a variant of the same statement: RAG is dead and irrelevant. Large context windows have killed it. Coding agents don't use it anymore.

I don't claim to be an expert on the matter of Retrieval Augmented Generation, but I've tinkered enough in the past to be dangerous. These claims seemed wild to me, so I've done some research to see if it is true and I've come to some conclusions (and questions) I'd like to share.

My project

About a year ago now, I got in one of my focused frenzies and started prototyping a book recommendation, classification, and content warning app. At its core it was a ton of book data stuffed into a vector database with fairly naive chunking and embedding of book titles, summaries, categories, and automated classifications.

It was naive and the data was individually sparse, but I had pulled a massive database of books into the system, and I was constantly surprised at how links were accurately drawn between book thematics as well as how accurate ambiguous semantic search could work across the whole collection.

Eventually it went the way of most of my side projects (i.e. the way of the dinosaurs), but it was a fun learning project. Also I think a great project to resurrect as I explore RAG in 2026.

The claim for RAG being dead

The primary claim for RAG being obsolete I see is typically in reference to the ever growing context window for frontier models. Previously on this point I'd reference how context degrades as it grows especially in the middle of the context window, but Anthropic just released generally available 1M-token context window for Opus 4.6 and Sonnet 4.6 with some very impressive claims on long-context retrieval and reasoning.

Opus 4.6 scores 78.3% on MRCR v2 at 1 million tokens, highest among frontier models.

Load entire codebases, large document sets, and long-running agents. Media limits expand to 600 images or PDF pages per request.
— Claude (@claudeai) March 13, 2026

If the claims are true, we genuinely could be looking at massive improvements in a context stuffing approach to semantic lookup and reasoning across bounded data sets. But still... how does that have any realistic chance to scale with truly massive data sets like the millions of books that I used in my setup? Maybe some day, but you would need billions of tokens in your context to hold the sort of data I want to work with.

Coding agents

Look no further than coding agents! RAG is officially dead in any true modern coding agent, and coding agents work in massive code bases with hundreds of thousands of files!

Well okay. First off, I think we have to define RAG for the rest of this article. I remember when I first started to use AI in coding tasks, I had it develop a small game in THREE.js and would be passing the whole codebase back and forth with the AI. The AI would constantly forget parts of the code and I had to start many new sessions until finally the source file was too large for it to do much with at all. This exact process would likely be much improved in these latest frontier models, but it's not realistic for a true codebase with current context sizes.

Any modern coding agent is performing some amount of retrieval in the codebase to find specific code files or functions that match the current task. It's retrieving chunks of data, just not from a vector database based on embedding similarity. It's able to use tools like ls, cat, sed, grep, or any other number of commands in its environment to filter data it can then pass along to context. Yes, it's a simpler setup not requiring any embedding or literal chunking of data, and you can store this data however you want. However, you're literally generating content based on augmented filtered knowledge retrieved from your dataset. AKA RAG right?

For the sake of argument, let's say RAG only refers to the vector database setup of retrieval for the rest of this article.

But now let's look at the sort of data coding agents are looking at. Code is intrinsically very amenable to this new form of retrieval straight from the file system. You have:

exact names
exact syntax
strong structure
obvious file boundaries
a lot of grep-friendly data

Do we really need vector RAG?

Compare this with my book recommendation app. It's a completely different task and idea, right? You have very ambiguous themes and structure. You aren't searching for a variable name which has to be spelled the same to even compile the codebase.

First let's consider how you might not need vector RAG in this case. You might just be searching for an exact book title or character name in which case you can get away without any generative AI at all. Plain retrieval will do the job for you. Even deeper levels of search such as finding a book with certain themes and a main character who fits a certain type, this kind of search could be handled with a smart filter similar to how coding agents work these days. You'd need a well defined metadata set for your books, but that could be auto-populated with AI. (Yes, I know, another can of worms that I have also tinkered with in my project. Let's assume it's easy for now.)

Sounds a little suspiciously like embedding the concepts of a book into a latent space. But it's really not a vector database! It's not an embedding! Or maybe it's just an improved form of embedding if you have a known type of object in your dataset (a book) and can predefine what metadata tags will be useful to extract from the content.

So now what about recommendations for similar books? Everything holds true here right? We can find similar books based on populated metadata and lexical searches for keywords from the summary or title. We can weight results on other factors like number of similar readers who also enjoyed this other book. We can even throw a reranker on at the end to intelligently sort the collected books we want to stuff into our context to recommend to the user.

So where does that leave us?

Honestly, I went into this thinking obviously vector RAG is not dead and will be the most useful for my use case here. Long context will not be at a place it can realistically process massive datasets anytime soon. It has certain advantages that are hard to replace with RAG, such as finding omissions in a dataset and search, but it has no chance to outperform (and undercut pricing) on RAG in large datasets. And it will likely always make sense to make the process more efficient no matter how large and reliable your context is.

It sounds to me like modern RAG has just been refined (or redefined) as a hybrid approach. To be honest, I'm probably just behind on the curve here. My next step in research will be resurrecting my book recommendation project and trying a few more modern approaches, some relying more on embeddings, others on smart populating and filtering of data, and others on a hybrid approach likely using agents to decide the most optimal way to retrieve any given request. Improvements to vector RAG, search and filtering, and ever-growing long-context windows are only all going to supplement this hybrid approach.

I've done some bench-marking with synthetic data using these various approaches, but I think I'm going to have to come up with a way to evaluate the true dataset, or at least something much more realistic to what I'd like to process.

Again, I'm just a tinkerer on this whole topic. I'd love to hear opinions and ideas I might not have thought of so far. I'd especially love to hear what sort of tools are actually being used for memory systems among the big players. But in the meantime I'll resurrect my app and test a more modern complete RAG system and I'll be back with findings.

View the original on X