AI Evaluation

How to evaluate the outputs of your AI models and pipelines

Scores and comments help you evaluate your AI performance any time.

Scoring an attempt

Each attempt can be scored on a scale of 1-10. The scoring system uses a color-coded visual indicator:

  • Green: Scores 8-10 indicate excellent performance
  • Yellow: Scores 4-7 indicate moderate performance
  • Red: Scores 1-3 indicate poor performance

To score an attempt:

  1. Click on any number from 1-10 in the score display
  2. The selected score will be highlighted and saved automatically
  3. You can change the score at any time by clicking a different number

Commenting on an attempt

Comments can be added to specific positions within your attempt outputs. This allows for precise feedback and annotations. To add a comment:

  1. Click the desired location in your output where you want to add the comment
  2. A comment box will appear at that position
  3. Enter your feedback in the text area
  4. Click "Add" to save your comment

Comments include:

  • The exact position where they were placed
  • The filename or context they're associated with
  • The comment text, timestamp, and user who added it

Tracking all comments across the challenge

Comments are stored and associated with specific attempts, allowing you to:

  • Review feedback across multiple attempts
  • Track improvements over time
  • Identify common issues or patterns in your AI outputs

If you click the "Attempts" button on the top right, you can see all the attempts for a challenge. You can also see all the comments for a challenge, grouped by attempt.

On this page