llama.cpp and Elixir

Published February 15, 2024 by Toran Billups

For the longest time I put programming language dogma ahead of technical pragmatism but recently the need for synthetic data generation helped steer me towards llama.cpp. I assumed the minimal operational complexity I enjoy with Elixir was somehow at odds with llama.cpp but together they provided enough harmony to help me level up.

My end goal here was to unlock access to bigger models like Mixtral-8x7B within Elixir so I could do data engineering with 24GB or less of vRAM. To my surprise the happy path was simple, straightforward and supports Mistral, Mixtral, Gemma and just about any flavor of Llama 2.

The clone and make process is simple enough

    git clone --depth=1 https://github.com/ggerganov/llama.cpp.git cpp
    cd cpp
    make clean && LLAMA_CUBLAS=1 make -j
  

Next, download a quantized gguf of Mixtral8x7b that works with llama.cpp.

    def download do
      url = "https://huggingface.co/TheBloke/Mixtral-8x7B-Instruct-v0.1-GGUF/resolve/main/mixtral-8x7b-instruct-v0.1.Q5_K_M.gguf?download=true"
      dir = "/home/hello/cpp/models"
      path = "mistrial-instruct.gguf"
      full_path = Path.join([dir, path])
      File.mkdir_p!(dir)
      Req.get!(url, into: File.stream!(full_path))
    end
  

Now, using System.cmd shell out to llama.cpp from Elixir.

    def prompt(text) do
      ggml_exec = "/home/hello/cpp/main"
      
      System.cmd(ggml_exec, [
        "-ngl",
        "20",
        "-m",
        "/home/hello/cpp/models/mistrial-instruct.gguf",
        "-c",
        "2048",
        "--temp",
        "1.0",
        "--repeat_penalty",
        "1.1",
        "-n",
        "-1",
        "-p",
        "<s>[INST]#{text} [/INST]"
      ])
      |> case do
        {cpp, 0} ->
          [_, result] = String.split(cpp, "[/INST]")
          result
        _other ->
          raise "BOOM"
      end
    end
  

With this function you can prompt Mixtral8x7b and generate synthetic data with ease!

    def generate do
      "data.json"
      |> get_json()
      |> Enum.map(fn data ->
        topic = data["topic"]
        friend = data["friend"]
        
        prompt = """
        Imagine you are Toran, write a text message from Toran to #{friend} about #{topic}.
        """
        
        results = prompt(prompt)
        result =
          if String.valid?(results) do
            results
          else
            nil
          end
          
        %{instruction: prompt, output: result}
      end)
      |> Enum.reject(&is_nil(&1.output))
      |> writejson("result.json")
    end
  

I've had great success with this simplistic approach because data engineering is a blend of extraction, cleaning and now prompting + function calling to other llms!


Buy Me a Coffee

Twitter / Github / Email