Local model deployment
Run inference on your machine ( Ollama, LM Studio, vLLM, … ) and point Mystilink Agent at the local OpenAI-compatible endpoint.
Local server (Ollama / LM Studio / …)
→ OpenAI-compatible HTTP API
Mystilink Agent (openai-compatible Provider)
Ollama (recommended)
ollama pull qwen2.5:7b
| Base URL | http://127.0.0.1:11434/v1 |
| Model ID | e.g. qwen2.5:7b (from ollama list) |
| API key | placeholder, e.g. ollama |
Create a custom Provider with API Family openai-compatible, then add a model and test connection.
Alternative: OpenAI template + override Base URL to http://127.0.0.1:11434/v1.
LM Studio
- Load a model → start Local Server (default port often 1234).
- Base URL:
http://127.0.0.1:1234/v1 - Custom Provider → test → save.
vLLM / llama.cpp / text-generation-webui
| Tool | Typical Base URL |
|---|---|
| vLLM | http://127.0.0.1:8000/v1 |
| llama.cpp server | http://127.0.0.1:8080/v1 |
| text-generation-webui | http://127.0.0.1:5000/v1 (enable OpenAI API extension) |
Remote GPU box
Use LAN IP, e.g. http://192.168.1.100:11434/v1. Do not expose unauthenticated servers to the public internet.
Tips
- Enable auto fallback to a cloud official model for hard tasks.
- Tool calling requires a model/backend that supports function calling.
- First load can be slow; smaller quantizations need less RAM.