Blog Projects Links About

Back

Tags: #rl

Mar 16, 2026

Isaac Lab 训练初体验与强化学习探索

从零开始学习 Isaac Lab 训练机器狗的过程中，对强化学习、奖励函数、贝叶斯优化和 PPO 的一些思考与总结。

8 min read Chinese
- rl
- isaac lab
- robotics
- 学习笔记
Mar 15, 2026

LLMs as Human Preference Proxies

LLM作为压缩了大量人类偏好描述的集合，实际上已经可以替代许多“手动学习人类偏好”的工作，比如强化学习中奖励函数的设计

4 min read Chinese
- llm
- rl
Mar 15, 2026

Compiled Behavior vs Runtime Simulation

A bilingual note on PPO, critics, world models, and why runtime simulation in robotics usually lives in the physics engine, not the policy.

9 min read bilingual
- rl
- ppo
- world-models
- robotics