Kinnari

标签: LLM

此标签下有15条笔记。

2026年3月05日
Good SFT Optimizes for SFT, Better SFT Prepares for Reinforcement Learning
2026年3月04日
Reinforcement Learning via Self-Distillation
2026年1月17日
SPRINT: Enabling Interleaved Planning and Parallelized Execution in Reasoning Models
2026年1月12日
Explore-Execute Chain: Towards an Efficient Structured Reasoning Paradigm
- LLM
- RLVR
2026年1月01日
The State Of LLMs 2025: Progress, Problems, and Predictions
2026年1月01日
RL究竟能不能突破Base边界——关于推理能力外推、稳定性与训练条件的系统分析
2025年12月31日
FlowRL: Matching Reward Distributions for LLM Reasoning
2025年12月31日
GiGPO
2025年12月30日
RL's Razor: Why On-Policy Reinforcement Learning Forgets Less
- LLM
- RLVR
- SFT
2025年12月15日
Nested Learning: The Illusion of Deep Learning Architecture
2025年12月13日
Emergent Hierarchical Reasoning in LLMs Through Reinforcement Learning
2025年12月11日
CWM: An Open-Weights LLM for Research on Code Generation with World Models
2025年12月09日
RLVR 算法对比
- LLM
- RLVR
2025年12月08日
Thinker: Learning Fast and Slow
2025年7月19日
The Big LLM Architecture Comparison

Created with Quartz v4.5.2 © 2026

GitHub
ZhiHu