English | 中文
A minimal, easy-to-read PyTorch re-implementation of Qwen3.5 vision-language models. Supports text+vision as well as dense and mixture-of-experts variants.
For Qwen3-VL implementation, see this branch.
For Qwen3 (text-only) and Qwen2.5 VL support, see this branch.
For DeepSeek R1, see this repo.
Join my Discord channel for more discussion!
Create a virtual environment:
pip install uv
uv venv
source .venv/bin/activate
uv pip install -r requirements.txtLaunch the interactive chat:
python run.pyNote: Use @relative/path/to/image.jpg to reference images.
Run one script that iterates model variants and prints:
- Hugging Face Transformers output
- Tiny-Qwen output
for the same image + prompt context, back-to-back per model.
python test/run_ultimate_compare.pyBy default it runs:
Qwen/Qwen3.5-0.8BQwen/Qwen3.5-2BQwen/Qwen3.5-4BQwen/Qwen3.5-9BQwen/Qwen3.5-27BQwen/Qwen3.5-35B-A3B
Useful flags:
# subset of models
python test/run_ultimate_compare.py --models Qwen/Qwen3.5-2B Qwen/Qwen3.5-9B
# custom image/prompt/tokens
python test/run_ultimate_compare.py \
--image-path test/data/test-img-1.jpg \
--prompt "Describe this image accurately in 2-3 sentences." \
--max-new-tokens 128 \
--no-enable-thinkingUsing the Qwen3_5 class in code:
from PIL import Image
from huggingface_hub import snapshot_download
from model.model import Qwen3_5
from model.processor import Processor
image = Image.open("test/data/test-img-1.jpg")
messages = [
{
"role": "user",
"content": [
{"type": "image", "image": image},
{"type": "text", "text": "What's on this image?"},
],
},
]
model_name = "Qwen/Qwen3.5-27B"
weights = snapshot_download(repo_id=model_name, cache_dir=".cache")
model = Qwen3_5.from_pretrained(weights_path=weights, device_map="auto")
processor = Processor.from_pretrained(model_name)
device = next(model.parameters()).device
inputs = processor(messages, add_generation_prompt=True, device=device)
output_ids = model.generate(**inputs, max_new_tokens=64)
print(processor.tokenizer.decode(output_ids[0].tolist()))
print("Streaming output:", end=" ", flush=True)
for token_id in model.generate_stream(**inputs, max_new_tokens=64):
print(processor.tokenizer.decode([token_id]), end="", flush=True)
print()