Skip to content
@IST-DASLab

IST Austria Distributed Algorithms and Systems Lab

Popular repositories

  1. gptq gptq Public

    Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".

    Python 1.7k 128

  2. sparsegpt sparsegpt Public

    Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".

    Python 601 76

  3. marlin marlin Public

    FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

    Python 249 17

  4. qmoe qmoe Public

    Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".

    Python 242 21

  5. QUIK QUIK Public

    Repository for the QUIK project, enabling the use of 4bit kernels for generative inference

    C++ 151 10

  6. OBC OBC Public

    Code for the NeurIPS 2022 paper "Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning".

    Python 82 11

Repositories

Showing 10 of 36 repositories
  • gptq Public

    Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".

    Python 1,656 Apache-2.0 128 18 1 Updated Mar 27, 2024
  • sparsegpt Public

    Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".

    Python 601 Apache-2.0 76 11 2 Updated Mar 21, 2024
  • marlin Public

    FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

    Python 249 Apache-2.0 17 9 2 Updated Feb 29, 2024
  • RoSA Public
    Python 14 Apache-2.0 1 0 0 Updated Feb 13, 2024
  • peft-rosa Public

    A fork of the PEFT library, supporting Robust Adaptation (RoSA)

    Python 5 Apache-2.0 3 1 0 Updated Feb 12, 2024
  • spops Public
    C++ 1 Apache-2.0 0 0 0 Updated Feb 12, 2024
  • SparseFinetuning Public

    Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry

    Python 34 Apache-2.0 6 0 0 Updated Jan 15, 2024
  • CAP Public

    Repository for Correlation Aware Prune (NeurIPS23) source and experimental code

    Python 0 Apache-2.0 1 0 0 Updated Nov 29, 2023
  • QUIK Public

    Repository for the QUIK project, enabling the use of 4bit kernels for generative inference

    C++ 151 Apache-2.0 10 3 0 Updated Nov 13, 2023
  • qmoe Public

    Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".

    Python 242 Apache-2.0 21 2 0 Updated Nov 4, 2023