DPPAN REST API Reference

Base URL: http://<orchestrator-host>:<port> (default port: 8001)

All responses are JSON unless noted. Timestamps are RFC 3339. Layer indices are 0-based, exclusive end (start_layer..end_layer).

Authentication

Admin endpoints require the X-Admin-Token header:

X-Admin-Token: <token>

The token is set with --admin-token when starting the orchestrator.
Without a token configured, all admin routes return 403.

System

`GET /health`

Returns orchestrator liveness.

Response 200

json

{ "ok": true, "version": "0.1.0" }

Models

`GET /v1/models`

OpenAI-compatible model list.

Response 200

json

{
  "object": "list",
  "data": [
    { "id": "llama3.2:1b", "object": "model", "owned_by": "dppan" }
  ]
}

`GET /api/models`

Full model registry with layer/architecture metadata from config/models.toml.

Response 200

json

[
  {
    "model_id": "llama3.2:1b",
    "n_layers": 16,
    "d_model": 2048,
    "gguf_sha256": "74701a8c..."
  }
]

`GET /admin/chain/:model_id` (admin)

Chain health for a specific model — shows which layer ranges are covered.

Response 200

json

{
  "model_id": "llama3.2:1b",
  "n_layers": 16,
  "covered": [[0, 8], [8, 16]],
  "gaps": [],
  "ready": true
}

Chat Completions

`POST /v1/chat/completions`

OpenAI-compatible chat endpoint (non-streaming and SSE streaming).

Request body

json

{
  "model": "llama3.2:1b",
  "messages": [
    { "role": "system",    "content": "You are a helpful assistant." },
    { "role": "user",      "content": "What is 2 + 2?" }
  ],
  "max_tokens":   200,
  "temperature":  0.8,
  "top_p":        0.95,
  "stream":       false
}

Field	Type	Default	Description
`model`	string	required	Model ID from the registry
`messages`	array	required	Conversation; `role` is `system`, `user`, or `assistant`
`max_tokens`	integer	512	Max new tokens to generate
`temperature`	float	0.8	Sampling temperature (0 = greedy)
`top_p`	float	0.95	Nucleus sampling p
`stream`	boolean	`false`	Enable SSE streaming

Response 200 (non-streaming)

json

{
  "id": "chatcmpl-<uuid>",
  "object": "chat.completion",
  "choices": [
    {
      "index": 0,
      "message": { "role": "assistant", "content": "4" },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 1,
    "total_tokens": 13
  }
}

Response 200 (streaming, Content-Type: text/event-stream)

Each event:

data: {"choices":[{"delta":{"content":"4"},"finish_reason":null}]}

data: [DONE]

Errors

Code	Meaning
`503`	No `Ready` nodes covering all layers of the requested model (chain gap)
`422`	Model not in registry
`400`	Malformed request

`DELETE /sessions/:id`

Close a generation session and free KV cache on all nodes.

Response 204 — no body

Nodes

`GET /nodes`

List all registered nodes.

Response 200 — array of NodeRecord:

json

[
  {
    "node_id":    "550e8400-e29b-41d4-a716-446655440000",
    "model_id":   "llama3.2:1b",
    "http_url":   "http://192.168.1.20:5001",
    "grpc_url":   "http://192.168.1.20:5002",
    "start_layer": 0,
    "end_layer":   8,
    "state":      "Ready",
    "vram_mb":    8192,
    "tokens_total": 1024,
    "requests_served": 12,
    "avg_latency_ms": 45.2,
    "uptime_s":   3600
  }
]

state values: Ready · Degraded · Dead · Joining

`GET /nodes/rankings`

Nodes ranked by combined score (throughput, latency, VRAM).

Response 200

json

{
  "rankings": [
    { "node_id": "...", "model_id": "llama3.2:1b", "score": 0.92, "rank": 1 }
  ]
}

`POST /nodes`

Explicit node registration with caller-provided layer range. Used by legacy (--legacy) mode.

Request body

json

{
  "node_id":    "550e8400-e29b-41d4-a716-446655440000",
  "model_id":   "llama3.2:1b",
  "http_url":   "http://192.168.1.20:5001",
  "grpc_url":   "http://192.168.1.20:5002",
  "start_layer": 0,
  "end_layer":   8,
  "n_total_layers": 16,
  "d_model":    2048,
  "vram_mb":    8192,
  "hardware":   { "gpu": "metal", "device_name": "Apple M4 Pro", "vram_mb": 32768 },
  "capabilities": [
    { "model_id": "llama3.2:1b", "n_layers": 16, "d_model": 2048 }
  ]
}

Response 200

json

{ "node_id": "550e8400-e29b-41d4-a716-446655440000", "warnings": [] }

`POST /nodes/join`

Auto-registration: orchestrator assigns the layer range based on available VRAM.

Request body

json

{
  "node_id":   "550e8400-e29b-41d4-a716-446655440000",
  "model_id":  "llama3.2:1b",
  "http_url":  "http://192.168.1.20:5001",
  "grpc_url":  "http://192.168.1.20:5002",
  "n_total_layers": 16,
  "d_model":   2048,
  "vram_mb":   8192,
  "layers":    "0-8",
  "hardware":  { "gpu": "metal", "device_name": "Apple M4 Pro", "vram_mb": 32768 },
  "capabilities": []
}

layers is optional — omit to let the orchestrator assign.

Response 200

json

{
  "node_id":    "550e8400-e29b-41d4-a716-446655440000",
  "start_layer": 0,
  "end_layer":   8,
  "warnings":   []
}

Code	Meaning
`409`	`node_id` already registered
`422`	Architecture mismatch or unknown model
`503`	No layer range available (all layers covered)

`POST /nodes/:id/health`

Node heartbeat — updates metrics counters.

Request body (all fields optional)

json

{
  "tokens_generated": 100,
  "requests_served":  5,
  "errors":           0,
  "avg_latency_ms":   42.0,
  "peak_vram_mb":     4096
}

Response 200

json

{ "ok": true }

`GET /nodes/:id/hardware`

Hardware profile reported by the node.

Response 200

json

{
  "gpu": "metal",
  "device_name": "Apple M4 Pro",
  "vram_mb": 32768
}

`GET /nodes/:id/metrics`

Per-node metrics timeseries (ring buffer, last ~60 samples at 10s intervals).

Response 200

json

{
  "node_id": "...",
  "samples": [
    { "tps": 45.2, "latency_ms": 22.1, "ts_ms": 1745123456000 }
  ]
}

`DELETE /nodes/:id`

Deregister a node (node-initiated clean exit).

Response 204 — no body

Metrics

`GET /metrics/summary`

Cluster-wide aggregate metrics.

Response 200

json

{
  "total_nodes": 2,
  "ready_nodes": 2,
  "total_tokens": 50000,
  "total_requests": 200,
  "avg_latency_ms": 38.5
}

`GET /metrics/model/:model_id`

Per-model metrics.

Response 200

json

{
  "model_id": "llama3.2:1b",
  "nodes":    2,
  "requests": 150,
  "tokens":   30000,
  "avg_latency_ms": 35.0
}

`GET /metrics/history`

Historical request rate (last N minutes, 1-minute buckets).

Response 200

json

{
  "buckets": [
    { "ts_ms": 1745123400000, "requests": 12, "tokens": 4800 }
  ]
}

Admin Endpoints

All admin routes require X-Admin-Token: <token>.

`GET /admin/verify`

Check token validity.

Response 200

json

{ "ok": true }

Response 403 if invalid.

`POST /admin/nodes/:id/reassign`

Change the layer range a node is responsible for.

Request body

json

{ "start_layer": 0, "end_layer": 8 }

Response 200

json

{ "ok": true, "node_id": "...", "start_layer": 0, "end_layer": 8 }

`PATCH /admin/nodes/:id/state`

Override node state.

Request body

json

{ "state": "ready" }

state values: "ready" · "idle"

Response 200

json

{ "ok": true, "node_id": "...", "state": "Ready" }

`DELETE /admin/nodes/:id/force`

Force-deregister a node (admin override — does not notify the node).

Response 200

json

{ "ok": true, "node_id": "..." }

Error Format

All errors return JSON:

json

{ "error": "human-readable description" }

Common status codes:

Code	Meaning
`400`	Bad request / missing field
`403`	Missing or invalid admin token
`404`	Node or resource not found
`409`	Conflict (e.g. duplicate node_id)
`422`	Capability mismatch / unknown model
`503`	No nodes available / layer gap

DPPAN REST API Reference ​

Authentication ​

System ​

GET /health ​

Models ​

GET /v1/models ​

GET /api/models ​

GET /admin/chain/:model_id (admin) ​

Chat Completions ​

POST /v1/chat/completions ​

DELETE /sessions/:id ​

Nodes ​

GET /nodes ​

GET /nodes/rankings ​

POST /nodes ​

POST /nodes/join ​

POST /nodes/:id/health ​

GET /nodes/:id/hardware ​

GET /nodes/:id/metrics ​

DELETE /nodes/:id ​

Metrics ​

GET /metrics/summary ​

GET /metrics/model/:model_id ​

GET /metrics/history ​

Admin Endpoints ​

GET /admin/verify ​

POST /admin/nodes/:id/reassign ​

PATCH /admin/nodes/:id/state ​

DELETE /admin/nodes/:id/force ​

Error Format ​

DPPAN REST API Reference

Authentication

System

`GET /health`

Models

`GET /v1/models`

`GET /api/models`

`GET /admin/chain/:model_id` (admin)

Chat Completions

`POST /v1/chat/completions`

`DELETE /sessions/:id`

Nodes

`GET /nodes`

`GET /nodes/rankings`

`POST /nodes`

`POST /nodes/join`

`POST /nodes/:id/health`

`GET /nodes/:id/hardware`

`GET /nodes/:id/metrics`

`DELETE /nodes/:id`

Metrics

`GET /metrics/summary`

`GET /metrics/model/:model_id`

`GET /metrics/history`

Admin Endpoints

`GET /admin/verify`

`POST /admin/nodes/:id/reassign`

`PATCH /admin/nodes/:id/state`

`DELETE /admin/nodes/:id/force`

Error Format