Usage

Inclavate runs three processes: the orchestrator (routes inference), one or more nodes (each serves a range of transformer layers), and the dashboard UI.

On Windows use dppan-orchestrator.exe / dppan.exe in place of ./dppan-orchestrator / ./dppan.

Single machine

Open three terminals in the bundle directory.

1 — Orchestrator

bash

./dppan-orchestrator --port 8001 --admin-token mysecret

2 — Node (joins with a model)

bash

./dppan join --orchestrator http://localhost:8001 --model llama3.2:1b

3 — Dashboard

bash

./dppan ui --orchestrator http://localhost:8001 --port 3000

Open http://localhost:3000.

Multi-machine

Machine A runs the orchestrator; other machines join as nodes. Nodes need only outbound TCP to ports 8001 and 9001 on Machine A — no inbound ports, no public IP.

Machine A — orchestrator + UI

bash

./dppan-orchestrator --port 8001 --admin-token mysecret
./dppan ui --orchestrator http://localhost:8001 --port 3000

Machine B, C, … — use Machine A's LAN IP

bash

./dppan join --orchestrator http://192.168.1.10:8001 --model llama3.2:1b

Each node auto-detects its GPU, resolves the model via Ollama, and receives a layer range from the orchestrator automatically.

Chat (OpenAI-compatible)

bash

curl -X POST http://localhost:8001/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"llama3.2:1b","messages":[{"role":"user","content":"Hello!"}],"max_tokens":200}'

See the API reference for streaming, sessions, and the node/metrics endpoints.

Common flags

`dppan-orchestrator`

Flag	Env	Default	Description
`--port`	`PORT`	required	REST API port
`--admin-token`	`DPPAN_ADMIN_TOKEN`	none	Enables the Admin tab in the dashboard
`--models-config`	`DPPAN_MODELS_CONFIG`	`./config/models.toml`	Model registry path
`--log-level`	`LOG_LEVEL`	`info`	`trace` / `debug` / `info` / `warn` / `error`

`dppan join`

Flag	Default	Description
`--orchestrator`	required	Orchestrator URL, e.g. `http://localhost:8001`
`--model`	required	Ollama model name or path to a `.gguf` file
`--layers`	auto	Layer range to serve, e.g. `0-16`
`--log-level`	`info`	Verbosity

`dppan ui`

Flag	Default	Description
`--orchestrator`	required	Orchestrator URL
`--port`	`3000`	Dashboard port
`--host`	`127.0.0.1`	Bind address — use `0.0.0.0` for LAN access

GPU troubleshooting

"CUDA error: the provided PTX was compiled with an unsupported toolchain" — the NVIDIA driver is too old; update to 576.02 or newer.
cudart64_*.dll / libcudart.so not found — install CUDA Toolkit 12.x. On Linux, add /usr/local/cuda/lib64 to LD_LIBRARY_PATH.
Falls back to CPU instead of GPU — run nvidia-smi; the GPU must be Turing or newer (GTX 10xx and older are unsupported).
macOS GPU unused — use the macos-arm64 build on Apple Silicon (Intel Macs run CPU-only).

Usage ​

Single machine ​

Multi-machine ​

Chat (OpenAI-compatible) ​

Common flags ​

dppan-orchestrator ​

dppan join ​

dppan ui ​

GPU troubleshooting ​

Usage

Single machine

Multi-machine

Chat (OpenAI-compatible)

Common flags

`dppan-orchestrator`

`dppan join`

`dppan ui`

GPU troubleshooting