共计 4 篇文章
2026
论文理解【LLM-OR】——【SIRL】Solver-Informed RL-Grounding Large Language Models for Authentic Optimization M
2025
LLM-RL的探索困境
论文理解 【LLM-RL】—— Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model
论文理解 【LLM-RL】——【EndoRM】Generalist Reward Models-Found Inside Large Language Models