Jyotishko BiswasHP, Head of AI – HP Finance
The advent of transformers and large language models (LLMs) has significantly improved the accuracy, relevance, and speed to market of AI applications. As the core technology of LLMs, transformers enable LLMs to predict and generate the next word (to be specific tokens) by learning from large datasets containing billions of words. This results in significant improvements in accuracy, appropriateness, and consistency. However, LLMs still have some shortcomings, and this is where recovery-augmented generation (RAG) becomes essential.
How RAG Fills the Transformer Gaps
Transformers are limited by the data they are trained on. For example, if they are trained on web data up to 2022, they cannot answer questions about events that occurred in 2024. Additionally, transformers can generate non-factual answers, called hallucinations, which compromise their reliability.
RAG is a technique in which an LLM is connected to an external, updatable database. It fills the data gaps of the processors by providing domain-specific and up-to-date information to the LLMs, allowing them to answer questions about recent events and significantly reduce hallucinations.
Limitations of Augmented Generation by Standard Recovery
Despite its advantages, the standard RAG has its limitations, which are described below.
1. It retrieves additional information even when the prompt is simple and retrieval is not necessary, resulting in higher computational and memory costs.
2. The information retrieved may be irrelevant, jeopardizing the quality of the LLM result, without any relevance check.
3. Only a certain number of top-notch documents are used, leaving out potentially useful information.
4. Similarity checks between the prompt and the retrieved documents are often insufficient; the usefulness of the retrieved documents is more critical.
5. Extracting relevant information from vector databases has limitations.
6. Leakage of private and sensitive information in LLM results remains a concern.
RAG advancements that help overcome these limitations
Many technological advancements have been made in the last two to three years to overcome the challenges of standard RAG.
Auto-RAG is one such advance. It addresses the issues of need for retrieval, relevance of retrieved documents, and quality of LLM output. It includes a critical LLM that determines whether retrieval is necessary based on the prompt. For simple prompts, such as “What is the capital of the United States?”, retrieval may not be necessary.
The critical LLM also evaluates the relevance of the retrieved documents, retaining only those that are relevant. This ensures that the main LLM uses relevant information, resulting in more accurate and consistent results.
Additionally, unlike standard RAG, which retrieves information once per prompt, Self-RAG can perform multiple retrievals per prompt, ensuring that more relevant information is provided.
Another advanced RAG approach is called MetRAG (“an augmented retrieval generation framework enhanced by multi-layered thoughts”), which addresses the main challenges of RAG. This method uses an additional LLM to evaluate the usefulness of retrieved documents rather than relying solely on similarity.
For example, let’s take the example of the question “Tell me about the famous football player Cristiano Ronaldo.” Document D1 states: “Cristiano Ronaldo is a famous football player,” while Document D2 states: “Cristiano Ronaldo is a Portuguese professional footballer, he plays as a striker and is captain of Saudi Pro League club Al Nassr and the Portugal national team.”
Similarity checking can give D1 a better ranking, but D2 contains more useful information. This shows that similarity does not always retrieve the most useful information; therefore, document usefulness is used to identify relevance.
Additionally, in MetRAG, another LLM summarizes the retrieved documents, thus avoiding information loss by ensuring that relevant details of non-priority documents are retained. This differs from the standard RAG, where only the retrieved priority documents are retained and the others are discarded. This approach of summarizing all retrieved documents results in a more accurate and comprehensive final result.
Another advanced RAG approach uses knowledge graphs instead of vector databases to store external information. While vector databases struggle to handle complex, multi-relational data, knowledge graphs excel by storing information as entities and their relationships. For example, in the sentence “Argentina won the 2022 FIFA World Cup,” “Argentina” and “2022 FIFA World Cup” are entities and “won” is the relationship.
Storing external information in a knowledge graph instead of a vector database allows RAG to retrieve more relevant information. This leads the core LLM to produce more accurate, relevant, and consistent results.
Protection of sensitive data
One of the main challenges of LLM models is the potential leakage of sensitive information or its use in training models. To address this issue, RAG can be enhanced by including checks to identify whether the retrieved documents contain sensitive and private information.
Another LLM, specially trained to recognize sensitive and personal data, can be employed to examine the retrieved information. If a document contains sensitive information, it is either excluded or anonymized/pseudonymized.
Preventing sensitive information leaks is essential in the healthcare industry, which handles extremely sensitive patient data, as noted in This item According to Suresh Martha: “The pharmaceutical industry is undergoing a significant transformation, driven by the integration of artificial intelligence (AI) and newer generative AI (GenAI) into aspects of drug discovery, clinical trials and patient care. While these advances promise substantial benefits, from accelerating drug development to delivering more personalized medical treatments, the GenAI revolution raises ethical considerations around data protection, privacy and responsible use of the technology.”
Limitations of Advanced RAG Systems
Although advanced RAG systems overcome many of the challenges of standard RAG, they come with their own set of limitations.
1. Additional processing load and increased latency due to the retrieval process may limit the use of RAG in low latency applications.
2. Long context resulting from adding retrieved documents to the prompt might restrict the use of LLM with shorter context lengths.
3. Although advanced RAG systems such as Self-RAG and knowledge graph-based RAG reduce the retrieval of irrelevant documents, further improvements are still needed.
Conclusion
Recent technological advances in RAG have improved various aspects of RAG-based applications while reducing computational and memory costs. Code to implement many of these techniques is available on GitHub, Hugging Face, and other repositories. However, despite these advances, gaps still exist and global research is underway to fill them.
Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs, and technology leaders. Am I eligible?