-
Notifications
You must be signed in to change notification settings - Fork 13.9k
Description
Prerequisites
- I am running the latest code. Mention the version if possible as well.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new and useful enhancement to share.
Feature Description
Handle the template for https://huggingface.co/WeiboAI/VibeThinker-1.5B (ORIGINAL FROM SAFETENSORS)
2º https://huggingface.co/mradermacher/VibeThinker-1.5B-GGUF
3º https://huggingface.co/MaziyarPanahi/VibeThinker-1.5B-GGUF
💡 Mathematical Reasoning: On the three major math benchmarks AIME24, AIME25, and HMMT25, its scores (80.3, 74.4, and 50.4, respectively) all surpass those of the initial DeepSeek R1 model, which has over 400 times the parameters (scores of 79.8, 70.0, and 41.7, respectively).
🌱 Code Generation: It achieved scores of 55.9 on LiveCodeBench v5 and 51.1 on v6. Its v6 score slightly leads Magistral Medium (50.3), underscoring its strong reasoning performance.
VibeThinker-1.5B's core innovation lies in the "Spectrum-to-Signal Principle" (SSP) training framework: it first explores solution diversity during the Supervised Fine-Tuning (SFT) stage, and then optimizes its policy to reinforce correct signals in the Reinforcement Learning (RL) stage. By systematically integrating these two phases, our approach establishes diversity as the central technical design principle, enabling VibeThinker-1.5B to achieve robust performance that surpasses conventional training paradigms.
Motivation
because this is overthink than chatting like simulating introspective and think itself by each explain.
Possible Implementation
No response