Evaluation Wizard
The evaluation wizard walks you through configuring and launching an evaluation against your Target. The first choice you make determines how the rest of the wizard unfolds.
| Mode | When to use |
|---|---|
| Autopilot | You want broad coverage without manual configuration. Spectral generates all test ingredients from your Knowledge Base. |
| Custom | You want full control over every aspect. |
Autopilot
Select dimensions
Choose which dimensions to include. Your selection determines how Spectral generates the agents:
- Compliance: agents attempt to elicit Principle violations through indirect phrasing, social engineering, and escalating requests.
- Accuracy: agents receive a fact drawn from your Knowledge Base and probe whether the system responds accurately around it.
- Focus: agents submit out-of-scope or restricted requests to test whether the system handles them correctly.
Your choice here also affects the dimensions that Spectral will examine after each conversation:
| Completion | Accuracy | Compliance | Focus | Responsiveness | |
|---|---|---|---|---|---|
| Accuracy selected | ✓ | ✓ | ✓ | ||
| Compliance selected | ✓ | ✓ | ✓ | ✓ | |
| Focus selected | ✓ |
Set Knowledge Base scope
Choose which documents from your Knowledge Base should inform the evaluation. Narrowing the selection focuses attacks on a specific area of your system; using the full KB gives broader coverage.
Set depth
Control how extensively Spectral probes your system. Higher depth means more conversations and broader scenario coverage, at the expense of a longer run time and higher cost.
| Depth | Description |
|---|---|
| Quick Scan | Fast, surface-level check. Best for smoke tests and quick regressions. |
| Standard | Balanced coverage of the key scenarios for each selected dimension. |
| Thorough | In-depth evaluation with broad scenario coverage. |
| Deep Dive | Coming soon. |
Launch
Review your configuration and launch. Spectral generates the tasks, principles and personas that will drive its agents automatically, based on the mode you selected and your target description and knowledge base.