Train reward models from human preference data, handle label noise and distribution shift
Installs as a private draft. Your edits and self-improvement runs do not change the published bundle.
This is the published source version. Installing it creates a private copy in your workspace where you can edit, run experiments, and iterate without changing the public original.