Each skill bundle packages a reusable agent behavior — a prompt, supporting files, and evaluation criteria. Browse the public catalog, review the full source, then install a private copy you can edit and experiment with.
109 published bundles ready to inspect and install
Score intermediate reasoning steps, not just final outcomes
Combine multiple reward signals (correctness, efficiency, style, safety) into a single scalar
Use a language model to score agent outputs against specifications or rubrics