A curated list for Efficient Large Language Models
-
Updated
Jun 17, 2025 - Python
A curated list for Efficient Large Language Models
Model compression toolkit engineered for enhanced usability, comprehensiveness, and efficiency.
[ICML24] Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for LLMs
D^2-MoE: Delta Decompression for MoE-based LLMs Compression
[ICLR 2024] Jaiswal, A., Gan, Z., Du, X., Zhang, B., Wang, Z., & Yang, Y. Compressing llms: The truth is rarely pure and never simple.
LLM Inference on AWS Lambda
[CAAI AIR'24] Minimize Quantization Output Error with Bias Compensation
Interpretation code for analyzing LLMs compression effects for the paper "When Reasoning Meets Compression: Understanding the Effects of LLMs Compression on Large Reasoning Models"
A standard PyTorch implementation of Google’s paper Language Modeling Is Compression—with no reliance on Haiku or JAX. Drawing on the original repository (https://github.com/google-deepmind/language_modeling_is_compression), this code is capable of reproducing the key results from the paper.
Token Price Estimation for LLMs
NYCU Edge AI Final Project Using SGLang
Add a description, image, and links to the llm-compression topic page so that developers can more easily learn about it.
To associate your repository with the llm-compression topic, visit your repo's landing page and select "manage topics."