STAPO: Stabilizing Reinforcement Learning for LLMs by Silencing Rare Spurious Tokens18 hours agovia huggingface0 ptshuggingface.co(opens in new window)AI Score: 35%paper