AI Evaluation
How to evaluate the outputs of your AI models and pipelines
Scores and comments help you evaluate your AI performance any time.
Scoring an attempt
Each attempt can be scored on a scale of 1-10. The scoring system uses a color-coded visual indicator:
- Green: Scores 8-10 indicate excellent performance
- Yellow: Scores 4-7 indicate moderate performance
- Red: Scores 1-3 indicate poor performance
To score an attempt:
- Click on any number from 1-10 in the score display
- The selected score will be highlighted and saved automatically
- You can change the score at any time by clicking a different number
Commenting on an attempt
Comments can be added to specific positions within your attempt outputs. This allows for precise feedback and annotations. To add a comment:
- Click the desired location in your output where you want to add the comment
- A comment box will appear at that position
- Enter your feedback in the text area
- Click "Add" to save your comment
Comments include:
- The exact position where they were placed
- The filename or context they're associated with
- The comment text, timestamp, and user who added it
Tracking all comments across the challenge
Comments are stored and associated with specific attempts, allowing you to:
- Review feedback across multiple attempts
- Track improvements over time
- Identify common issues or patterns in your AI outputs
If you click the "Attempts" button on the top right, you can see all the attempts for a challenge. You can also see all the comments for a challenge, grouped by attempt.