Skip to main content

Evaluation Wizard

The evaluation wizard walks you through configuring and launching an evaluation against your Target. The first choice you make determines how the rest of the wizard unfolds.

ModeWhen to use
AutopilotYou want broad coverage without manual configuration. Spectral generates all test ingredients from your Knowledge Base.
CustomYou want full control over every aspect.

Autopilot

Select dimensions

Choose which dimensions to include. Your selection determines how Spectral generates the agents:

  • Compliance: agents attempt to elicit Principle violations through indirect phrasing, social engineering, and escalating requests.
  • Accuracy: agents receive a fact drawn from your Knowledge Base and probe whether the system responds accurately around it.
  • Focus: agents submit out-of-scope or restricted requests to test whether the system handles them correctly.

Your choice here also affects the dimensions that Spectral will examine after each conversation:

CompletionAccuracyComplianceFocusResponsiveness
Accuracy selected
Compliance selected
Focus selected

Set Knowledge Base scope

Choose which documents from your Knowledge Base should inform the evaluation. Narrowing the selection focuses attacks on a specific area of your system; using the full KB gives broader coverage.

Set depth

Control how extensively Spectral probes your system. Higher depth means more conversations and broader scenario coverage, at the expense of a longer run time and higher cost.

DepthDescription
Quick ScanFast, surface-level check. Best for smoke tests and quick regressions.
StandardBalanced coverage of the key scenarios for each selected dimension.
ThoroughIn-depth evaluation with broad scenario coverage.
Deep DiveComing soon.

Launch

Review your configuration and launch. Spectral generates the tasks, principles and personas that will drive its agents automatically, based on the mode you selected and your target description and knowledge base.