Yesterday Google released Gemma - an open LLM that folks can run locally on their machines (similarly to llama2). I was wondering how easy it would be to run Gemma on my computer, chat with it and interact with it from a Go program.

Turns it - thanks to Ollama - it's extremely easy! Gemma was already added to Ollama, so all one has to do is run:

$ ollama run gemma

And wait for a few minutes while the model downloads. From this point on, my previous post about using Ollama locally in Go applies with pretty much no changes. Gemma becomes available through a REST API locally, and can be accessed from ollama-aware libraries like LangChainGo.

I went ahead and added a --model flag to all my code samples from that post, and they can all run with --model gemma now. It all just works, due to the magic of standard interfaces:

  • Gemma is packaged in a standard interface for inclusion in Ollama
  • Ollama then presents a standardized REST API for this model, just like it does for other compatible models
  • LangChainGo has an Ollama provider that lets us write code to interact with any model running through Ollama

So we can write code like:

package main

import (
  "context"
  "flag"
  "fmt"
  "log"

  "github.com/tmc/langchaingo/llms"
  "github.com/tmc/langchaingo/llms/ollama"
)

func main() {
  modelName := flag.String("model", "", "ollama model name")
  flag.Parse()

  llm, err := ollama.New(ollama.WithModel(*modelName))
  if err != nil {
    log.Fatal(err)
  }

  query := flag.Args()[0]
  ctx := context.Background()
  completion, err := llms.GenerateFromSinglePrompt(ctx, llm, query)
  if err != nil {
    log.Fatal(err)
  }

  fmt.Println("Response:\n", completion)
}

And then run it as follows:

$ go run ollama-completion-arg.go --model gemma "what should be added to 91 to make -20?"
Response:
 The answer is -111.

91 + (-111) = -20

Gemma seems relatively fast for a model running on a CPU. I find that the default 7B model, while much more capable than the default 7B llama2 based on published benchmarks - also runs about 30% faster on my machine.

Without LangChainGo

While LangChainGo offers a conveneint API that's standardized across LLM providers, its use is by no means required for this sample. Ollama itself has a Go API as part of its structure and it can be used externally as well. Here's an equivalent sample that doesn't require LangChainGo:

package main

import (
  "context"
  "flag"
  "fmt"
  "log"

  "github.com/jmorganca/ollama/api"
)

func main() {
  modelName := flag.String("model", "", "ollama model name")
  flag.Parse()

  client, err := api.ClientFromEnvironment()
  if err != nil {
    log.Fatal(err)
  }

  req := &api.GenerateRequest{
    Model:  *modelName,
    Prompt: flag.Args()[0],
    Stream: new(bool), // disable streaming
  }

  ctx := context.Background()
  var response string
  respFunc := func(resp api.GenerateResponse) error {
    response = resp.Response
    return nil
  }

  err = client.Generate(ctx, req, respFunc)
  if err != nil {
    log.Fatal(err)
  }

  fmt.Println("Response:\n", response)
}