Nano-vLLM-v1

A lightweight vLLM implementation built from scratch. While built upon the foundation of Nano-vLLM, this project significantly re-engineers the core architecture to reproduce the vLLM v1 scheduler and introduces Chunked Prefill.

Key Features

🚀 Fast inference - Comparable online inference speeds and offline throughput performance to vLLM v1.
📖 Readable codebase - Clean implementation in ~ 1,200 lines of Python code
⚡ Optimization Suite - Paged Attention, Prefix Caching, Chunked Prefill, Tensor Parallelism, Torch Compilation, and full reproduction of the vLLM v1 scheduling strategy, etc.

Quick Start

See example.py (offline) and serving_bench.py (online) for usage. The API mirrors vLLM's interface with minor differences in the LLM.generate method:

offline usage example:

from nanovllm import LLM, SamplingParams
llm = LLM("/YOUR/MODEL/PATH", enforce_eager=True, tensor_parallel_size=1, chunked_prefill=True)
sampling_params = SamplingParams(temperature=0.6, max_tokens=256)
prompts = ["Hello, Nano-vLLM."]
outputs = llm.generate(prompts, sampling_params)
outputs[0]["text"]

online benchmarking:

python serving_bench.py \
--model /path/to/Qwen3-14B/ \
--request-rate 10 \
--num-requests 1024 \
--tensor-parallel-size 1 \
--max-num-batched-tokens 1024 \
--max-num-seqs 1024 \
--random-input-len 128 \
--random-output-len 100 \
--chunked-prefill \
--enforce-eager

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
nanovllm		nanovllm
LICENSE		LICENSE
README.md		README.md
bench.py		bench.py
example.py		example.py
pyproject.toml		pyproject.toml
serving_bench.py		serving_bench.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Nano-vLLM-v1

Key Features

Quick Start

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

Nano-vLLM-v1

Key Features

Quick Start

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages