xk6-llm | adamr.io

xk6-llm demo

Drives any OpenAI-compatible endpoint (vLLM, TGI, llama.cpp, hosted APIs) under realistic concurrency and reports the metrics that matter for serving: time-to-first-token, inter-token latency, tokens per output token, goodput under SLO, plus per-request cost and energy. Results land in Prometheus and Grafana for live dashboards.