Skip to main content

Local model deployment

Run inference on your machine ( Ollama, LM Studio, vLLM, … ) and point Mystilink Agent at the local OpenAI-compatible endpoint.

Local server (Ollama / LM Studio / …)
→ OpenAI-compatible HTTP API
Mystilink Agent (openai-compatible Provider)
ollama pull qwen2.5:7b
Base URLhttp://127.0.0.1:11434/v1
Model IDe.g. qwen2.5:7b (from ollama list)
API keyplaceholder, e.g. ollama

Create a custom Provider with API Family openai-compatible, then add a model and test connection.

Alternative: OpenAI template + override Base URL to http://127.0.0.1:11434/v1.

LM Studio

  1. Load a model → start Local Server (default port often 1234).
  2. Base URL: http://127.0.0.1:1234/v1
  3. Custom Provider → test → save.

vLLM / llama.cpp / text-generation-webui

ToolTypical Base URL
vLLMhttp://127.0.0.1:8000/v1
llama.cpp serverhttp://127.0.0.1:8080/v1
text-generation-webuihttp://127.0.0.1:5000/v1 (enable OpenAI API extension)

Remote GPU box

Use LAN IP, e.g. http://192.168.1.100:11434/v1. Do not expose unauthenticated servers to the public internet.

Tips

  • Enable auto fallback to a cloud official model for hard tasks.
  • Tool calling requires a model/backend that supports function calling.
  • First load can be slow; smaller quantizations need less RAM.

See also