Kinnari

标签: RLVR

此标签下有12条笔记。

2026年3月05日
Good SFT Optimizes for SFT, Better SFT Prepares for Reinforcement Learning
2026年1月21日
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
2026年1月12日
Explore-Execute Chain: Towards an Efficient Structured Reasoning Paradigm
- LLM
- RLVR
2026年1月01日
The State Of LLMs 2025: Progress, Problems, and Predictions
2026年1月01日
RL究竟能不能突破Base边界——关于推理能力外推、稳定性与训练条件的系统分析
2025年12月31日
FlowRL: Matching Reward Distributions for LLM Reasoning
2025年12月31日
GiGPO
2025年12月30日
RL's Razor: Why On-Policy Reinforcement Learning Forgets Less
- LLM
- RLVR
- SFT
2025年12月20日
Meta-RL Induces Exploration In Language Agents
2025年12月13日
Emergent Hierarchical Reasoning in LLMs Through Reinforcement Learning
2025年12月09日
RLVR 算法对比
- LLM
- RLVR
2025年12月08日
Thinker: Learning Fast and Slow

Created with Quartz v4.5.2 © 2026

GitHub
ZhiHu