lit-ollama

Replace ollama with LitServe

Features

LitGPT model support: Load and serve any LitGPT-compatible model using the standard Ollama interface
Ollama-compatible API: Full compatibility with the Ollama API specification, allowing you to use any Ollama client without modifications
LitServe powered: Built on LitServe for high-performance model serving with auto-batching and GPU acceleration

Installation

With pip:

python -m pip install lit-ollama

With uv:

uv add lit-ollama

How to use it

Run like any other litserve server:

import litserve as ls

from lit_ollama.server.api import LitOllamaAPI

api = LitOllamaAPI("mock")
server = ls.LitServer(
    api,
    accelerator="auto",
    devices="auto",
    callbacks=None,
    middlewares=None,
)
server.run()

Start the server with a specific model:

python server.py --model "meta-llama/Llama-3.2-1B-Instruct"

You can test the server by using the client to interact with it:

python client.py

Docs

uv run mkdocs build -f ./mkdocs.yml -d ./_build/

Update template

copier update --trust -A --vcs-ref=HEAD

Credits

This project was generated with