End-to-end: collect preferences → train reward model → optimize policy
Installs as a private draft. Your edits and self-improvement runs do not change the published bundle.
This is the published source version. Installing it creates a private copy in your workspace where you can edit, run experiments, and iterate without changing the public original.