Proxy Performance Benchmark
Measure the latency and throughput overhead introduced by OpenTela's request routing layer.
OpenTela introduces a small amount of overhead when routing requests. To measure this, we provide a benchmarking script that tests the latency and throughput of requests sent directly to a worker node versus requests routed through an OpenTela dispatcher.
Running the Benchmark
We provide a dedicated Docker Compose configuration for this benchmark located in local-demo/simulation. It spins up an OpenTela head node and a single worker node running a lightweight /v1/echo service.
-
Start the benchmark environment
cd local-demo/simulation docker compose -f docker-compose-benchmark.yml up -d -
Run the Python script The script requires
aiohttp. You can set up a virtual environment and run it:python3 -m venv venv source venv/bin/activate pip install aiohttp python benchmark.py -
Tear down the environment
docker compose -f docker-compose-benchmark.yml down
Example Results
Testing from a standard local environment under a concurrency of 50 for 1000 requests shows the following typical results:
| Metric | Direct (Baseline) | OpenTela Proxy |
|---|---|---|
| Throughput | ~2800 - 3600 req/s | ~1800 - 2000 req/s |
| P50 Latency | ~12 - 16 ms | ~12 - 14 ms |
| Average Latency | ~13 - 17 ms | ~26 - 30 ms |
| P90 Latency | ~17 - 19 ms | ~90 - 100 ms |
(Note: Results will vary by hardware. In general, expect the routing overhead to add ~10-15ms to the average request latency.)
Large-Scale Single-Machine Simulation
Spin up a large-scale simulation of the OpenTela decentralized network using Docker Compose.
How SwissAI Uses OpenTela for Scalable LLM Serving
How the Swiss AI Initiative leverages OpenTela to serve LLMs on HPC clusters, enabling researchers to access powerful models seamlessly.