Each skill bundle packages a reusable agent behavior — a prompt, supporting files, and evaluation criteria. Browse the public catalog, review the full source, then install a private copy you can edit and experiment with.
109 published bundles ready to inspect and install
Detect when RL training narrows capability (great on trained tasks, worse on everything else)
Quantify how much RL training on coding transfers to (say) data analysis or writing
Build evals that test whether RL training on task A improved performance on related task B