Nano vLLM with vLLM v1's request scheduling strategy and chunked prefill
-
Updated
Jan 26, 2026 - Python
Nano vLLM with vLLM v1's request scheduling strategy and chunked prefill
Lightweight LLM inference engine inspired by nano-vllm, with radix-tree based prefix cache, tp & pp, cuda graph, openai api, async scheduling, and more.
Add a description, image, and links to the nano-vllm topic page so that developers can more easily learn about it.
To associate your repository with the nano-vllm topic, visit your repo's landing page and select "manage topics."