Many Options — Open-Source LLMs

Uchechi Njoku
3 min readJul 8, 2024

--

In my introductory post about Large Language Models (LLMs), I mentioned OpenAI, which powers the well-known and widely used ChatGPT. While OpenAI offers an affordable service, it is not free, which can be a limitation for those learning on a budget or wanting to experiment extensively with prompts. As LLM technology evolves rapidly, several free alternatives have emerged. In this article, I will introduce you to some of these freely available LLMs and guide you on where to find them.

Hugging Face 🤗

Hugging Face is a comprehensive repository for various models, including Large Language Models (LLMs). For open-source LLMs, Hugging Face features two key leaderboards. The first is the Open LLM Leaderboard, which compares the performance of different LLMs with a focus on the quality and correctness of their responses. The second is the llm-perf-leaderboard, which emphasizes computational performance, evaluating how efficiently these models execute tasks.

Top ranking Open-Source LLMs on Open LLM Leaderboard
Top ranking Open-Source LLMs on llm-perf-leaderboard

Generally, LLMs require GPUs to run locally. If you have a powerful PC with a GPU, you’re all set. However, if you don’t have access to a GPU, here are two options for you:

  1. Cloud-Based Services: Utilize cloud platforms that offer GPU resources on a pay-as-you-go basis. Providers like Google Cloud, AWS, Azure, and SartunCloud have options for running LLMs without the need for local hardware investment.
  2. Free Online Platforms: Leverage free platforms such as Google Colab, which provides limited free GPU access or Hugging Face’s hosted inference API, which allows you to interact with various LLMs without the need for local computation.

Another option for using LLMs if you do not have access to GPU locally is the Ollama library. It support various LLMs as listed in the library. It is important to note that: you should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.

The rest of this article lists three open-source LLM models that can be used with Ollama.

Model 1: Qwen2–7B-Instruct

The Qwen/Qwen2–7B-Instruct LLM model was developed by Alibaba’s DAMO Academy. It is an advanced open-source language model with 7 billion parameters, designed for various natural language processing tasks. It excels in generating human-like text, answering questions, and performing instructional tasks with high accuracy. The model leverages state-of-the-art architecture to deliver robust performance, making it a competitive option in the landscape of large language models. Its capabilities are enhanced through fine-tuning and continuous learning from diverse datasets, ensuring it remains relevant and effective across different applications.

Model 2: meta-llama/Meta-Llama-3–8B-Instruct

The Meta-Llama-3–8B-Instruct is a large language model (LLM) developed by Meta (formerly known as Facebook). This model is part of the Llama (Large Language Model Meta AI) series and is specifically designed for instruction-following tasks. It contains 8 billion parameters, which enable it to generate coherent and contextually relevant responses to a wide range of prompts. Meta-Llama-3–8B-Instruct is trained to understand and execute instructions, making it suitable for applications such as question-answering, dialogue systems, and other tasks requiring detailed and accurate responses. The development of this model reflects Meta’s ongoing efforts to advance natural language processing technologies and improve the capabilities of AI in understanding and interacting with human language.

Model 3: microsoft/Phi-3-small-128k-instruct

The Phi-3-small-128k-instruct is a language model developed by Microsoft. It is part of the Phi series of models, specifically designed for instruction-based tasks. This model has a parameter size smaller than larger counterparts but is optimized for efficiency and performance. It can handle up to 128k tokens, making it suitable for tasks that require understanding and generating long contexts. The model is trained to follow instructions and perform various natural language processing tasks, providing a balance between capability and computational efficiency.

In the next part of this blog, I will test these models and transform one of these models into a simple application using Streamlit.

--

--

Uchechi Njoku

I am an early stage researcher in Data Engineering for Data Science, a polyglot and a traveler :-))