Usage
Inclavate runs three processes: the orchestrator (routes inference), one or more nodes (each serves a range of transformer layers), and the dashboard UI.
On Windows use
dppan-orchestrator.exe/dppan.exein place of./dppan-orchestrator/./dppan.
Single machine
Open three terminals in the bundle directory.
1 — Orchestrator
./dppan-orchestrator --port 8001 --admin-token mysecret2 — Node (joins with a model)
./dppan join --orchestrator http://localhost:8001 --model llama3.2:1b3 — Dashboard
./dppan ui --orchestrator http://localhost:8001 --port 3000Open http://localhost:3000.
Multi-machine
Machine A runs the orchestrator; other machines join as nodes. Nodes need only outbound TCP to ports 8001 and 9001 on Machine A — no inbound ports, no public IP.
Machine A — orchestrator + UI
./dppan-orchestrator --port 8001 --admin-token mysecret
./dppan ui --orchestrator http://localhost:8001 --port 3000Machine B, C, … — use Machine A's LAN IP
./dppan join --orchestrator http://192.168.1.10:8001 --model llama3.2:1bEach node auto-detects its GPU, resolves the model via Ollama, and receives a layer range from the orchestrator automatically.
Chat (OpenAI-compatible)
curl -X POST http://localhost:8001/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"llama3.2:1b","messages":[{"role":"user","content":"Hello!"}],"max_tokens":200}'See the API reference for streaming, sessions, and the node/metrics endpoints.
Common flags
dppan-orchestrator
| Flag | Env | Default | Description |
|---|---|---|---|
--port | PORT | required | REST API port |
--admin-token | DPPAN_ADMIN_TOKEN | none | Enables the Admin tab in the dashboard |
--models-config | DPPAN_MODELS_CONFIG | ./config/models.toml | Model registry path |
--log-level | LOG_LEVEL | info | trace / debug / info / warn / error |
dppan join
| Flag | Default | Description |
|---|---|---|
--orchestrator | required | Orchestrator URL, e.g. http://localhost:8001 |
--model | required | Ollama model name or path to a .gguf file |
--layers | auto | Layer range to serve, e.g. 0-16 |
--log-level | info | Verbosity |
dppan ui
| Flag | Default | Description |
|---|---|---|
--orchestrator | required | Orchestrator URL |
--port | 3000 | Dashboard port |
--host | 127.0.0.1 | Bind address — use 0.0.0.0 for LAN access |
GPU troubleshooting
- "CUDA error: the provided PTX was compiled with an unsupported toolchain" — the NVIDIA driver is too old; update to 576.02 or newer.
cudart64_*.dll/libcudart.sonot found — install CUDA Toolkit 12.x. On Linux, add/usr/local/cuda/lib64toLD_LIBRARY_PATH.- Falls back to CPU instead of GPU — run
nvidia-smi; the GPU must be Turing or newer (GTX 10xx and older are unsupported). - macOS GPU unused — use the
macos-arm64build on Apple Silicon (Intel Macs run CPU-only).