llm-compression

Interpretation code for analyzing LLMs compression effects for the paper "When Reasoning Meets Compression: Understanding the Effects of LLMs Compression on Large Reasoning Models"

pruning quantization distillation awq llm gptq llm-compression

Updated Oct 8, 2025
Python

A standard PyTorch implementation of Google’s paper Language Modeling Is Compression—with no reliance on Haiku or JAX. Drawing on the original repository (https://github.com/google-deepmind/language_modeling_is_compression), this code is capable of reproducing the key results from the paper.

llm-compression

Updated Oct 17, 2025
Python

FardinHash / tokencal

Star

Token Price Estimation for LLMs

tokens cost-optimization cost-management token-count llm-compression llm-cost llm-token token-cost

Updated Jun 20, 2024
Python

KeithLin724 / NYCU_Edge_AI_SGLang

Star

NYCU Edge AI Final Project Using SGLang

quantization llm vllm llm-compression sglang

Updated Jun 4, 2025
Python

Improve this page

Add a description, image, and links to the llm-compression topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-compression topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm-compression

Here are 12 public repositories matching this topic...

horseee / Awesome-Efficient-LLM

Tencent / AngelSlim

pprp / Pruner-Zero

lliai / D2MoE

VITA-Group / llm-kick

Picovoice / llm-compression-benchmark

Picovoice / serverless-picollm

GongCheng1919 / bias-compensation

psunlpgroup / Compression-Effects

txsing / GoogleLLMCompress

FardinHash / tokencal

KeithLin724 / NYCU_Edge_AI_SGLang

Improve this page

Add this topic to your repo