Skip to content

DPPAN REST API Reference

Base URL: http://<orchestrator-host>:<port> (default port: 8001)

All responses are JSON unless noted. Timestamps are RFC 3339. Layer indices are 0-based, exclusive end (start_layer..end_layer).


Authentication

Admin endpoints require the X-Admin-Token header:

X-Admin-Token: <token>

The token is set with --admin-token when starting the orchestrator.
Without a token configured, all admin routes return 403.


System

GET /health

Returns orchestrator liveness.

Response 200

json
{ "ok": true, "version": "0.1.0" }

Models

GET /v1/models

OpenAI-compatible model list.

Response 200

json
{
  "object": "list",
  "data": [
    { "id": "llama3.2:1b", "object": "model", "owned_by": "dppan" }
  ]
}

GET /api/models

Full model registry with layer/architecture metadata from config/models.toml.

Response 200

json
[
  {
    "model_id": "llama3.2:1b",
    "n_layers": 16,
    "d_model": 2048,
    "gguf_sha256": "74701a8c..."
  }
]

GET /admin/chain/:model_id (admin)

Chain health for a specific model — shows which layer ranges are covered.

Response 200

json
{
  "model_id": "llama3.2:1b",
  "n_layers": 16,
  "covered": [[0, 8], [8, 16]],
  "gaps": [],
  "ready": true
}

Chat Completions

POST /v1/chat/completions

OpenAI-compatible chat endpoint (non-streaming and SSE streaming).

Request body

json
{
  "model": "llama3.2:1b",
  "messages": [
    { "role": "system",    "content": "You are a helpful assistant." },
    { "role": "user",      "content": "What is 2 + 2?" }
  ],
  "max_tokens":   200,
  "temperature":  0.8,
  "top_p":        0.95,
  "stream":       false
}
FieldTypeDefaultDescription
modelstringrequiredModel ID from the registry
messagesarrayrequiredConversation; role is system, user, or assistant
max_tokensinteger512Max new tokens to generate
temperaturefloat0.8Sampling temperature (0 = greedy)
top_pfloat0.95Nucleus sampling p
streambooleanfalseEnable SSE streaming

Response 200 (non-streaming)

json
{
  "id": "chatcmpl-<uuid>",
  "object": "chat.completion",
  "choices": [
    {
      "index": 0,
      "message": { "role": "assistant", "content": "4" },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 1,
    "total_tokens": 13
  }
}

Response 200 (streaming, Content-Type: text/event-stream)

Each event:

data: {"choices":[{"delta":{"content":"4"},"finish_reason":null}]}

data: [DONE]

Errors

CodeMeaning
503No Ready nodes covering all layers of the requested model (chain gap)
422Model not in registry
400Malformed request

DELETE /sessions/:id

Close a generation session and free KV cache on all nodes.

Response 204 — no body


Nodes

GET /nodes

List all registered nodes.

Response 200 — array of NodeRecord:

json
[
  {
    "node_id":    "550e8400-e29b-41d4-a716-446655440000",
    "model_id":   "llama3.2:1b",
    "http_url":   "http://192.168.1.20:5001",
    "grpc_url":   "http://192.168.1.20:5002",
    "start_layer": 0,
    "end_layer":   8,
    "state":      "Ready",
    "vram_mb":    8192,
    "tokens_total": 1024,
    "requests_served": 12,
    "avg_latency_ms": 45.2,
    "uptime_s":   3600
  }
]

state values: Ready · Degraded · Dead · Joining

GET /nodes/rankings

Nodes ranked by combined score (throughput, latency, VRAM).

Response 200

json
{
  "rankings": [
    { "node_id": "...", "model_id": "llama3.2:1b", "score": 0.92, "rank": 1 }
  ]
}

POST /nodes

Explicit node registration with caller-provided layer range. Used by legacy (--legacy) mode.

Request body

json
{
  "node_id":    "550e8400-e29b-41d4-a716-446655440000",
  "model_id":   "llama3.2:1b",
  "http_url":   "http://192.168.1.20:5001",
  "grpc_url":   "http://192.168.1.20:5002",
  "start_layer": 0,
  "end_layer":   8,
  "n_total_layers": 16,
  "d_model":    2048,
  "vram_mb":    8192,
  "hardware":   { "gpu": "metal", "device_name": "Apple M4 Pro", "vram_mb": 32768 },
  "capabilities": [
    { "model_id": "llama3.2:1b", "n_layers": 16, "d_model": 2048 }
  ]
}

Response 200

json
{ "node_id": "550e8400-e29b-41d4-a716-446655440000", "warnings": [] }

POST /nodes/join

Auto-registration: orchestrator assigns the layer range based on available VRAM.

Request body

json
{
  "node_id":   "550e8400-e29b-41d4-a716-446655440000",
  "model_id":  "llama3.2:1b",
  "http_url":  "http://192.168.1.20:5001",
  "grpc_url":  "http://192.168.1.20:5002",
  "n_total_layers": 16,
  "d_model":   2048,
  "vram_mb":   8192,
  "layers":    "0-8",
  "hardware":  { "gpu": "metal", "device_name": "Apple M4 Pro", "vram_mb": 32768 },
  "capabilities": []
}

layers is optional — omit to let the orchestrator assign.

Response 200

json
{
  "node_id":    "550e8400-e29b-41d4-a716-446655440000",
  "start_layer": 0,
  "end_layer":   8,
  "warnings":   []
}
CodeMeaning
409node_id already registered
422Architecture mismatch or unknown model
503No layer range available (all layers covered)

POST /nodes/:id/health

Node heartbeat — updates metrics counters.

Request body (all fields optional)

json
{
  "tokens_generated": 100,
  "requests_served":  5,
  "errors":           0,
  "avg_latency_ms":   42.0,
  "peak_vram_mb":     4096
}

Response 200

json
{ "ok": true }

GET /nodes/:id/hardware

Hardware profile reported by the node.

Response 200

json
{
  "gpu": "metal",
  "device_name": "Apple M4 Pro",
  "vram_mb": 32768
}

GET /nodes/:id/metrics

Per-node metrics timeseries (ring buffer, last ~60 samples at 10s intervals).

Response 200

json
{
  "node_id": "...",
  "samples": [
    { "tps": 45.2, "latency_ms": 22.1, "ts_ms": 1745123456000 }
  ]
}

DELETE /nodes/:id

Deregister a node (node-initiated clean exit).

Response 204 — no body


Metrics

GET /metrics/summary

Cluster-wide aggregate metrics.

Response 200

json
{
  "total_nodes": 2,
  "ready_nodes": 2,
  "total_tokens": 50000,
  "total_requests": 200,
  "avg_latency_ms": 38.5
}

GET /metrics/model/:model_id

Per-model metrics.

Response 200

json
{
  "model_id": "llama3.2:1b",
  "nodes":    2,
  "requests": 150,
  "tokens":   30000,
  "avg_latency_ms": 35.0
}

GET /metrics/history

Historical request rate (last N minutes, 1-minute buckets).

Response 200

json
{
  "buckets": [
    { "ts_ms": 1745123400000, "requests": 12, "tokens": 4800 }
  ]
}

Admin Endpoints

All admin routes require X-Admin-Token: <token>.

GET /admin/verify

Check token validity.

Response 200

json
{ "ok": true }

Response 403 if invalid.

POST /admin/nodes/:id/reassign

Change the layer range a node is responsible for.

Request body

json
{ "start_layer": 0, "end_layer": 8 }

Response 200

json
{ "ok": true, "node_id": "...", "start_layer": 0, "end_layer": 8 }

PATCH /admin/nodes/:id/state

Override node state.

Request body

json
{ "state": "ready" }

state values: "ready" · "idle"

Response 200

json
{ "ok": true, "node_id": "...", "state": "Ready" }

DELETE /admin/nodes/:id/force

Force-deregister a node (admin override — does not notify the node).

Response 200

json
{ "ok": true, "node_id": "..." }

Error Format

All errors return JSON:

json
{ "error": "human-readable description" }

Common status codes:

CodeMeaning
400Bad request / missing field
403Missing or invalid admin token
404Node or resource not found
409Conflict (e.g. duplicate node_id)
422Capability mismatch / unknown model
503No nodes available / layer gap

Free to run · Proprietary