StudyPreprintWikiReinforcement LearningSequential DecisionsModerateVector Policy Optimization: Training for Diversity Improves Test-Time SearchRead full paper →AuthorsRyan Bahlous-Boldi, Isha Puri, Idan Shenfeld, Akarsh Kumar, Mehul Damani, Sebastian Risi, Omar Khattab, Zhang-Wei Hong, Pulkit AgrawalYear2026Read full paper →More Reinforcement Learning research