An Intro to Large Language Models (LLMS)
From time to time, certain technologies capture the spotlight and become the central focus of conversations in the tech industry. Right now, Large Language Models (LLMs) are at the forefront of these discussions. Generative AI is a hot topic, giving rise to new job roles like prompt engineer and AI scientist. But what exactly are LLMs?
Language Models (LMs)
First, let’s understand what language models are. Language Models (LMs) are algorithms that predict the next word in a sequence based on the words that have already been seen. A common example of this concept in action is the predictive text feature on mobile phone keypads. When you start typing on your phone, the keypad suggests possible next words to help you complete your sentence. The image below illustrates this.
Large Language Models (LLMs)
LLMs are advanced versions of LMs constructed using algorithms that require a vast number of parameters — sometimes up to billions — and are trained on enormous datasets. A common example of an LLM in action is ChatGPT. There are many other LLMs as well, as shown in the image below.
To use an LLM, a message (often called a prompt) is sent to it by the user, and the LLM then sends back a response as illustrated in the image below.
This classic architecture means that the user receives a response based solely on the data from which the LLM was trained. Many organizations possess valuable data that is often private and, therefore, not included in the training of these publicly available LLMs. However, this private data is usually insufficient to build a high-quality LLM, not to mention the significant computational resources required for such an endeavor. Given these challenges, how can an organization with valuable but limited private data leverage this technology? The answer lies in adjusting the architecture. 🙂
Retrieval-Augmented Generation (RAG)
RAG is another buzzword you will come across, it is an adjusted architecture for using LLMs that allows users to include their data (context).
You can think of Generation in RAG as the LLMs. As we’ve seen, the LLMs generate (continually) the next word. However, we don’t want to rely solely on the data already available to the LLM, so we augment the process with context retrieved from our knowledge base. That is RAG! It retrieves knowledge from our data and includes it in the prompt sent to the LLM to augment the process that generates the response as shown in the image above.
When you ask ChatGPT to write you a sample cover letter, that’s the classic architecture. When you include your CV and ask it to write you a cover letter using the information in the CV, that’s RAG 😅
Summary
- Language Models (LMs) are algorithms that predict the next word in a sequence based on the words that have already been seen.
- LLMs are advanced versions of LMs constructed using algorithms that require a vast number of parameters — sometimes up to billions — and are trained on enormous datasets.
- Most organizations have valuable private data which are however, insufficient to build a high-quality LLM, in addition to the computation costs involved.
- With the Retrieval-Augmented Generation (RAG) architecture, organizations can customize and use existing LLMs more effectively by including context available in their knowledge base to the prompts.