Example Configurations

These examples show complete .env and models.yaml configurations for common self-hosted setups. Each example assumes you have already run engrammic selfhost and have a ~/.engrammic/ directory with the compose file and a .env.

For the full variable reference, see Configuration. For Ollama-specific wiring, see Running Models Locally.

Ollama Only (Fully Local)

No cloud API keys. Both embeddings and LLM synthesis run through a local Ollama instance. Synthesis is active as long as Ollama is reachable.

This setup gives you full storage and recall with no data leaving your machine. Synthesis quality depends on the local model you choose. Reranking is not available without a cloud provider.

# Required core
ENGRAMMIC_LICENSE_KEY=ENGR_your_key_here
POSTGRES_PASSWORD=your-secure-password

# Embeddings via Ollama
EMBEDDING_MODEL=ollama/nomic-embed-text
EMBEDDING_DIMENSIONS=768
OLLAMA_API_BASE=http://host.docker.internal:11434

# Reranking -- not available without a cloud provider
RERANKING__ENABLED=false

# LLM via Ollama (also needs models.yaml override)
OLLAMA_BASE_URL=http://host.docker.internal:11434

# Telemetry
TELEMETRY__ENABLED=false

default_tier: self_hosted
tiers:
  self_hosted:
    reasoning:
      provider: ollama
      model: llama3.2
    fast:
      provider: ollama
      model: llama3.2
    query_expander:
      provider: ollama
      model: llama3.2

Pull the required models before starting:

ollama pull nomic-embed-text
ollama pull llama3.2

Place the models.yaml override at ~/.engrammic/config/models.yaml, then start the stack:

docker compose up -d

OpenAI

All AI calls go through OpenAI. Embeddings use text-embedding-3-small, reranking uses Cohere (OpenAI does not offer a reranking API).

# Required core
ENGRAMMIC_LICENSE_KEY=ENGR_your_key_here
POSTGRES_PASSWORD=your-secure-password

# Embeddings via OpenAI
EMBEDDING_MODEL=openai/text-embedding-3-small
EMBEDDING_DIMENSIONS=1536
OPENAI_API_KEY=sk-...

# Reranking via Cohere
RERANKING__ENABLED=true
RERANKING__PROVIDER=cohere
RERANKING__MODEL=rerank-english-v3.0
COHERE_API_KEY=...

# Telemetry
TELEMETRY__ENABLED=false

default_tier: self_hosted
tiers:
  self_hosted:
    reasoning:
      provider: openai
      model: gpt-4o
    fast:
      provider: openai
      model: gpt-4o-mini
    query_expander:
      provider: openai
      model: gpt-4o-mini

If you do not have a Cohere key, set RERANKING__ENABLED=false. Recall still works but results are not reranked.

Mixed (Local Embeddings, Cloud LLM)

Local embeddings keep your stored content on-device. Cloud LLM handles synthesis only. A practical balance between privacy and synthesis quality.

# Required core
ENGRAMMIC_LICENSE_KEY=ENGR_your_key_here
POSTGRES_PASSWORD=your-secure-password

# Embeddings via Ollama (local)
EMBEDDING_MODEL=ollama/nomic-embed-text
EMBEDDING_DIMENSIONS=768
OLLAMA_API_BASE=http://host.docker.internal:11434

# Reranking disabled (no cloud embedding = no cloud reranker)
RERANKING__ENABLED=false

# LLM via Anthropic (cloud)
ANTHROPIC_API_KEY=sk-ant-...

# Telemetry
TELEMETRY__ENABLED=false

default_tier: self_hosted
tiers:
  self_hosted:
    reasoning:
      provider: anthropic
      model: claude-sonnet-4-5
    fast:
      provider: anthropic
      model: claude-haiku-4-5
    query_expander:
      provider: anthropic
      model: claude-haiku-4-5

Pull the embedding model before starting:

ollama pull nomic-embed-text

Vertex AI (GCP-Native)

Full GCP stack. Embeddings and reranking via Vertex AI, LLM via Gemini. Requires a GCP project with the Vertex AI API enabled and credentials available to the container.

# Required core
ENGRAMMIC_LICENSE_KEY=ENGR_your_key_here
POSTGRES_PASSWORD=your-secure-password

# Embeddings via Vertex AI
EMBEDDING_MODEL=vertex_ai/text-embedding-005
EMBEDDING_DIMENSIONS=768
VERTEX_PROJECT=my-gcp-project
VERTEX_LOCATION=us-central1

# Reranking via Vertex AI
RERANKING__ENABLED=true
RERANKING__PROVIDER=vertex
RERANKING__MODEL=semantic-ranker-default@latest

# Telemetry
TELEMETRY__ENABLED=false

default_tier: self_hosted
tiers:
  self_hosted:
    reasoning:
      provider: vertex_ai
      model: gemini-2.0-flash-001
    fast:
      provider: vertex_ai
      model: gemini-2.0-flash-lite-001
    query_expander:
      provider: vertex_ai
      model: gemini-2.0-flash-lite-001

Vertex AI authentication uses Application Default Credentials (ADC). You must mount a service account key or configure Workload Identity so the container can reach the Vertex API. The simplest local approach is to run gcloud auth application-default login on the host and mount ~/.config/gcloud into the container.

Example Configurations

Ollama Only (Fully Local)

OpenAI

Mixed (Local Embeddings, Cloud LLM)

Vertex AI (GCP-Native)

On this page