Already Registered?

Innovations in Evaluating AI Agent Performance

The hard part of building AI agents is knowing whether your modifications improved or degraded their ability to perform their tasks. The only way to know is to measure their success against real-world examples. Enter gym scenarios: model replicas of common UX patterns and computer-use actions that validate modifications to QA Wolf AI.

Meet our speakers

Caleb Masters

Host / Producer at QA Wolf

Yurij Mikhalevich

Staff Engineering Lead at QA Wolf

Privacy Policy|Terms and Conditions

Innovations in Evaluating AI Agent Performance

Register Now

Meet our speakers

Caleb Masters

Yurij Mikhalevich