All Articles

RAG without embeddings and AI infra companies

At Hotseat, we’ve hit a major milestone: the relevance of quotes from our PDF repository has jumped from 18% to 72%.

The PDFs we’re dealing with range from tens to hundreds of pages each, filled with dense legal language and complex structures. Often, there’s a highly relevant example, counter-example, exception, or passage that answers a question, and finding it can be like discovering a gem. There are two main challenges:

  • Many candidates within the PDFs are weak.
  • The strong candidates require a significant leap to connect them to the legal question at hand.

At Hotseat, most of our analysis is implemented in an agentic fashion, but until now, searching through the PDF repository used embedding-based retrieval. I’ve been a long-time skeptic of embeddings for RAG. Christian Griset wrote an excellent critique of embedding-based retrieval and coined the term semantic dissonance, which I think nicely captures one of the problems with using embeddings; I focused on some others in my post.

Can you create a useful RAG using embeddings? Sure, you can, and lots of people do. However, I will bet that many enterprise AI apps that are still at the proof-of-concept or demo Friday stage will struggle to mature into satisfying products on the backbone of vanilla RAG with a vector database at its heart. Vector databases are simply not powerful enough, as our experience at Hotseat shows. We bit the bullet, ditched embeddings, rolled up our sleeves to develop an in-house LLM-based retrieval, and leaped to the level of product our users would see as intelligent.

Side note: before we moved to LLM-based retrieval we tried ColBERT embeddings through RAGatouille. We observed a jump to around 35% relevance; still too low and we didn’t have confidence it’s a path worth pursuing.

AI Infrastructure Companies

My friends naturally asked whether we should ditch AI for legal and just focus on AI infrastructure since we’re onto something. I tend to think that the ideal path for AI infrastructure would be akin to the story of AWS: it was born out of a deep understanding of the need for elastic compute from running Amazon — one of the largest internet websites at the time.

You need to have that level of understanding of the problem space to forge the right abstractions. It’s hard to acquire this as an external entity; hence, cloud computing was born as a spin-off and not a separate startup. I think AI infrastructure companies are in a similar boat, and most will run out of runway before they get to the right abstractions.

For now, strong teams shipping breakout AI products will roll out their infrastructure stacks as they already do.

The LLM-based retrieval is the work of Hugo Dutka.

Published

Deep Learning ∩ Applications. A recent pivot from a 'promising career' in systems programming (core team behind the Scala programming language). Pastime: Ambient Computing. Grzegorz Kossakowski on Twitter