Dimensions
When Spectral tests your system, it evaluates your system's performance under different dimensions, that is, different angles of expected model behavior. Each dimension corresponds to a specific failure mode.
What it tests
Whether your system delivers on the task that drove the conversation, from first request to final outcome.
Why it matters
Superficially fluent responses that do not actually fulfill a request are a frequent and underreported failure mode. Users may not always notice, but incomplete task delivery accumulates into poor experience and reduced trust over time.
Example
An internal IT assistant with tools to reset passwords and open support tickets.
S
My password expired and I can't log in. Reset it and open a high-priority support ticket so IT knows my account was locked out.
T
Your password has been reset. Your new temporary password is Xk9#mP2q. You'll be prompted to change it on first login.
The password reset was handled, but the support ticket was never opened. Both steps were explicitly requested; only one was taken. Completion measures whether the full task was done, not whether the conversation sounded helpful.