IST Austria Distributed Algorithms and Systems Lab

gptq Public

Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".

Python 1.7k 128

sparsegpt Public

Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".

Python 601 76

marlin Public

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python 249 17

qmoe Public

Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".

Python 242 21

QUIK Public

Repository for the QUIK project, enabling the use of 4bit kernels for generative inference

OBC Public

Code for the NeurIPS 2022 paper "Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning".

Python 82 11

Provide feedback