Survive or Collapse: The Asymmetric Roles of Data Gating and Reward Grounding in Self-Play RL | Steady Practice | SteadyPractice