LLM Serving with Docker Containers
How to use Docker-based LLM serving engines (e.g. SGLang) as a worker node subprocess in OpenTela.
OpenTela's --subprocess flag lets you delegate the LLM serving process to any command, including a docker run invocation. This is useful when you prefer not to install CUDA dependencies directly on the host, or when you want to pin a specific image version of a serving engine like SGLang.
Prerequisites
- Docker installed on the worker machine (
docker --version) - NVIDIA Container Toolkit installed, so Docker can access GPUs (
nvidia-ctk runtime configure --runtime=docker) - A running head node (see Spin Up a Network)
How --subprocess works
When you pass --subprocess "docker run ...", OpenTela launches that command as a child process. It supervises the process and restores health-check state accordingly. The command string is split on whitespace — it is not passed through a shell, so shell quoting, pipes, and redirections do not work. Keep argument values free of spaces, or use a wrapper script (see Complex Arguments below).
Step 1: Start the head node
If you haven't already, start a head node on a machine with a public IP address:
./otela start --mode standalone --public-addr {YOUR_IP_ADDR} --seed 0Note the peer ID printed in the logs — you'll need it for the worker command below.
Step 2: Start a worker node with SGLang in Docker
./otela start \
--bootstrap.addr /ip4/{YOUR_HEAD_IP}/tcp/43905/p2p/{YOUR_HEAD_PEER_ID} \
--subprocess "docker run --rm --gpus all --network host -v /root/.cache/huggingface:/root/.cache/huggingface -e HF_TOKEN={YOUR_HF_TOKEN} lmsysorg/sglang:latest python3 -m sglang.launch_server --model-path Qwen/Qwen3-8B --port 30000 --host 0.0.0.0" \
--service.name llm \
--service.port 30000 \
--seed 1Key flags explained:
| Docker flag | Purpose |
|---|---|
--rm | Remove the container when it exits, avoiding leftover stopped containers |
--gpus all | Pass all host GPUs into the container (requires NVIDIA Container Toolkit) |
--network host | Share the host network namespace so OpenTela can reach the container on localhost:30000 without explicit port mapping |
-v /root/.cache/huggingface:/root/.cache/huggingface | Mount the Hugging Face model cache from the host to avoid re-downloading on each run |
-e HF_TOKEN=... | Pass your Hugging Face token so the container can download gated models |
The SGLang server flags:
| SGLang flag | Purpose |
|---|---|
--model-path Qwen/Qwen3-8B | Model to load (Hugging Face model ID or local path) |
--port 30000 | Port inside the container; matches --service.port |
--host 0.0.0.0 | Bind on all interfaces so it is reachable from the host |
OpenTela flags:
| otela flag | Purpose |
|---|---|
--subprocess "docker run ..." | The command OpenTela will launch and supervise |
--service.name llm | Service name used for routing; keep this as llm for LLM serving |
--service.port 30000 | Port OpenTela will proxy requests to (must match the SGLang port) |
Step 3: Verify the worker has registered
Once the SGLang server is ready (this takes a minute or two while the model loads), OpenTela registers the worker with the head node. Check the head node's CRDT table:
curl http://{YOUR_HEAD_IP}:8092/v1/dnt/tableYou should see the worker's peer entry with "service": [{"name": "llm", ...}] and the model listed under "identity_group".
Step 4: Send requests
Use the head node as a single entry point — OpenTela routes to the worker automatically:
import openai
client = openai.OpenAI(
base_url="http://{YOUR_HEAD_IP}:8092/v1/service/llm/v1",
api_key="test-token"
)
response = client.chat.completions.create(
model="Qwen/Qwen3-8B",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response)Pinning a specific GPU
To assign a specific GPU (e.g., device 0) to a container instead of all GPUs, replace --gpus all with --gpus device=0:
--subprocess "docker run --rm --gpus device=0 --network host ... lmsysorg/sglang:latest python3 -m sglang.launch_server --model-path Qwen/Qwen3-8B --port 30000 --host 0.0.0.0"When running multiple workers on the same host (each on a different GPU), start a separate otela process per worker and assign each to a different GPU and port:
# Worker 0 on GPU 0, port 30000
./otela start --bootstrap.addr ... --subprocess "docker run --rm --gpus device=0 --network host ... python3 -m sglang.launch_server --model-path Qwen/Qwen3-8B --port 30000 --host 0.0.0.0" --service.name llm --service.port 30000 --tcpport 43905 --udpport 59820 --port 8092 --seed 1
# Worker 1 on GPU 1, port 30001
./otela start --bootstrap.addr ... --subprocess "docker run --rm --gpus device=1 --network host ... python3 -m sglang.launch_server --model-path Qwen/Qwen3-35B-A22B --port 30001 --host 0.0.0.0" --service.name llm --service.port 30001 --tcpport 43906 --udpport 59821 --port 8093 --seed 2Complex arguments and wrapper scripts
Because --subprocess is split on whitespace without shell interpretation, you cannot use spaces inside argument values or shell features like &&, pipes, or variable expansion. For anything more complex, write a small wrapper script and pass its path instead:
#!/bin/bash
docker run --rm \
--gpus all \
--network host \
-v "$HOME/.cache/huggingface:/root/.cache/huggingface" \
-e "HF_TOKEN=$HF_TOKEN" \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path Qwen/Qwen3-8B \
--port 30000 \
--host 0.0.0.0 \
--trust-remote-codechmod +x start-sglang.sh
HF_TOKEN=your_token ./otela start \
--bootstrap.addr /ip4/{HEAD_IP}/tcp/43905/p2p/{HEAD_PEER_ID} \
--subprocess ./start-sglang.sh \
--service.name llm \
--service.port 30000The wrapper script is executed directly (no shell expansion of the path), so it must be executable and referenced by a path without spaces.
Equivalent config file
If you prefer not to pass everything on the command line, you can set subprocess in the config file at ~/.config/opentela/cfg.yaml:
name: gpu-worker-docker
service:
name: llm
port: "30000"
subprocess: "./start-sglang.sh"
bootstrap:
sources:
- "https://bootstraps.opentela.ai/v1/dnt/bootstraps"
security:
require_signed_binary: false
solana:
skip_verification: trueThen run simply:
./otela start