StudyPreprintWikiReinforcement LearningModeratePost-Training is About States, Not Tokens: A State Distribution View of SFT, RL, and On-Policy DistillationRead full paper →AuthorsDong NieYear2026Read full paper →More Reinforcement Learning research