Firom's Blog
Blog
Projects
Links
About
Search
Dark Theme
Menu
Back
Tags:
#rl
Mar 15, 2026
LLMs as Human Preference Proxies
LLM作为压缩了大量人类偏好描述的集合,实际上已经可以替代许多“手动学习人类偏好”的工作,比如强化学习中奖励函数的设计
4 min read
Chinese
llm
rl