vLLM

vllm Public

A high-throughput and memory-efficient inference and serving engine for LLMs

llm-compressor Public

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Python 2.9k 429

recipes Public

Common recipes to run vLLM

Jupyter Notebook 489 165

speculators Public

A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM

Python 269 50

semantic-router Public

System Level Intelligent Router for Mixture-of-Models at Cloud, Data Center and Edge

Provide feedback