STAPO: Stabilizing Reinforcement Learning for LLMs by Silencing Rare Spurious Tokens

18 hours agovia huggingface0 pts

AI Score: 35%paper

Comments

Comments are not yet available for curated items. Check back soon!