DPPAN REST API Reference
Base URL: http://<orchestrator-host>:<port> (default port: 8001)
All responses are JSON unless noted. Timestamps are RFC 3339. Layer indices are 0-based, exclusive end (start_layer..end_layer).
Authentication
Admin endpoints require the X-Admin-Token header:
X-Admin-Token: <token>The token is set with --admin-token when starting the orchestrator.
Without a token configured, all admin routes return 403.
System
GET /health
Returns orchestrator liveness.
Response 200
{ "ok": true, "version": "0.1.0" }Models
GET /v1/models
OpenAI-compatible model list.
Response 200
{
"object": "list",
"data": [
{ "id": "llama3.2:1b", "object": "model", "owned_by": "dppan" }
]
}GET /api/models
Full model registry with layer/architecture metadata from config/models.toml.
Response 200
[
{
"model_id": "llama3.2:1b",
"n_layers": 16,
"d_model": 2048,
"gguf_sha256": "74701a8c..."
}
]GET /admin/chain/:model_id (admin)
Chain health for a specific model — shows which layer ranges are covered.
Response 200
{
"model_id": "llama3.2:1b",
"n_layers": 16,
"covered": [[0, 8], [8, 16]],
"gaps": [],
"ready": true
}Chat Completions
POST /v1/chat/completions
OpenAI-compatible chat endpoint (non-streaming and SSE streaming).
Request body
{
"model": "llama3.2:1b",
"messages": [
{ "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": "What is 2 + 2?" }
],
"max_tokens": 200,
"temperature": 0.8,
"top_p": 0.95,
"stream": false
}| Field | Type | Default | Description |
|---|---|---|---|
model | string | required | Model ID from the registry |
messages | array | required | Conversation; role is system, user, or assistant |
max_tokens | integer | 512 | Max new tokens to generate |
temperature | float | 0.8 | Sampling temperature (0 = greedy) |
top_p | float | 0.95 | Nucleus sampling p |
stream | boolean | false | Enable SSE streaming |
Response 200 (non-streaming)
{
"id": "chatcmpl-<uuid>",
"object": "chat.completion",
"choices": [
{
"index": 0,
"message": { "role": "assistant", "content": "4" },
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 12,
"completion_tokens": 1,
"total_tokens": 13
}
}Response 200 (streaming, Content-Type: text/event-stream)
Each event:
data: {"choices":[{"delta":{"content":"4"},"finish_reason":null}]}
data: [DONE]Errors
| Code | Meaning |
|---|---|
503 | No Ready nodes covering all layers of the requested model (chain gap) |
422 | Model not in registry |
400 | Malformed request |
DELETE /sessions/:id
Close a generation session and free KV cache on all nodes.
Response 204 — no body
Nodes
GET /nodes
List all registered nodes.
Response 200 — array of NodeRecord:
[
{
"node_id": "550e8400-e29b-41d4-a716-446655440000",
"model_id": "llama3.2:1b",
"http_url": "http://192.168.1.20:5001",
"grpc_url": "http://192.168.1.20:5002",
"start_layer": 0,
"end_layer": 8,
"state": "Ready",
"vram_mb": 8192,
"tokens_total": 1024,
"requests_served": 12,
"avg_latency_ms": 45.2,
"uptime_s": 3600
}
]state values: Ready · Degraded · Dead · Joining
GET /nodes/rankings
Nodes ranked by combined score (throughput, latency, VRAM).
Response 200
{
"rankings": [
{ "node_id": "...", "model_id": "llama3.2:1b", "score": 0.92, "rank": 1 }
]
}POST /nodes
Explicit node registration with caller-provided layer range. Used by legacy (--legacy) mode.
Request body
{
"node_id": "550e8400-e29b-41d4-a716-446655440000",
"model_id": "llama3.2:1b",
"http_url": "http://192.168.1.20:5001",
"grpc_url": "http://192.168.1.20:5002",
"start_layer": 0,
"end_layer": 8,
"n_total_layers": 16,
"d_model": 2048,
"vram_mb": 8192,
"hardware": { "gpu": "metal", "device_name": "Apple M4 Pro", "vram_mb": 32768 },
"capabilities": [
{ "model_id": "llama3.2:1b", "n_layers": 16, "d_model": 2048 }
]
}Response 200
{ "node_id": "550e8400-e29b-41d4-a716-446655440000", "warnings": [] }POST /nodes/join
Auto-registration: orchestrator assigns the layer range based on available VRAM.
Request body
{
"node_id": "550e8400-e29b-41d4-a716-446655440000",
"model_id": "llama3.2:1b",
"http_url": "http://192.168.1.20:5001",
"grpc_url": "http://192.168.1.20:5002",
"n_total_layers": 16,
"d_model": 2048,
"vram_mb": 8192,
"layers": "0-8",
"hardware": { "gpu": "metal", "device_name": "Apple M4 Pro", "vram_mb": 32768 },
"capabilities": []
}layers is optional — omit to let the orchestrator assign.
Response 200
{
"node_id": "550e8400-e29b-41d4-a716-446655440000",
"start_layer": 0,
"end_layer": 8,
"warnings": []
}| Code | Meaning |
|---|---|
409 | node_id already registered |
422 | Architecture mismatch or unknown model |
503 | No layer range available (all layers covered) |
POST /nodes/:id/health
Node heartbeat — updates metrics counters.
Request body (all fields optional)
{
"tokens_generated": 100,
"requests_served": 5,
"errors": 0,
"avg_latency_ms": 42.0,
"peak_vram_mb": 4096
}Response 200
{ "ok": true }GET /nodes/:id/hardware
Hardware profile reported by the node.
Response 200
{
"gpu": "metal",
"device_name": "Apple M4 Pro",
"vram_mb": 32768
}GET /nodes/:id/metrics
Per-node metrics timeseries (ring buffer, last ~60 samples at 10s intervals).
Response 200
{
"node_id": "...",
"samples": [
{ "tps": 45.2, "latency_ms": 22.1, "ts_ms": 1745123456000 }
]
}DELETE /nodes/:id
Deregister a node (node-initiated clean exit).
Response 204 — no body
Metrics
GET /metrics/summary
Cluster-wide aggregate metrics.
Response 200
{
"total_nodes": 2,
"ready_nodes": 2,
"total_tokens": 50000,
"total_requests": 200,
"avg_latency_ms": 38.5
}GET /metrics/model/:model_id
Per-model metrics.
Response 200
{
"model_id": "llama3.2:1b",
"nodes": 2,
"requests": 150,
"tokens": 30000,
"avg_latency_ms": 35.0
}GET /metrics/history
Historical request rate (last N minutes, 1-minute buckets).
Response 200
{
"buckets": [
{ "ts_ms": 1745123400000, "requests": 12, "tokens": 4800 }
]
}Admin Endpoints
All admin routes require X-Admin-Token: <token>.
GET /admin/verify
Check token validity.
Response 200
{ "ok": true }Response 403 if invalid.
POST /admin/nodes/:id/reassign
Change the layer range a node is responsible for.
Request body
{ "start_layer": 0, "end_layer": 8 }Response 200
{ "ok": true, "node_id": "...", "start_layer": 0, "end_layer": 8 }PATCH /admin/nodes/:id/state
Override node state.
Request body
{ "state": "ready" }state values: "ready" · "idle"
Response 200
{ "ok": true, "node_id": "...", "state": "Ready" }DELETE /admin/nodes/:id/force
Force-deregister a node (admin override — does not notify the node).
Response 200
{ "ok": true, "node_id": "..." }Error Format
All errors return JSON:
{ "error": "human-readable description" }Common status codes:
| Code | Meaning |
|---|---|
400 | Bad request / missing field |
403 | Missing or invalid admin token |
404 | Node or resource not found |
409 | Conflict (e.g. duplicate node_id) |
422 | Capability mismatch / unknown model |
503 | No nodes available / layer gap |