Skip to main content

Annotations & Queues

Annotations turn traces or conversations into labeled data; queues organize that labeling work for reviewers.


What users create

  • Annotation Queue: A workspace for reviewers. You pick whether items are traces or conversations and define the questions they must answer.
  • Questions: The prompts reviewers see. Types supported: Freeform, Boolean, Multiple Choice, Single Choice, Numeric. You can add helper text, placeholders, required/optional, defaults, and min/max for numeric.
  • Answers: Reviewer submissions tied to a queue item (trace or conversation), stored per question with the appropriate value type.

Why it matters

  • Produces consistent human labels for evaluations and experiments.
  • Keeps review load organized (who labels what, and what’s left).
  • Enables regression checks after model/prompt changes using the same queues.

How it works in Arcane

  1. Create an Annotation Queue (Traces or Conversations).
  2. Add Questions to the queue’s template (choose type, options, requirements).
  3. Enqueue items (traces or conversations) into the queue.
  4. Reviewers answer the questions for each item; answers are saved as annotations.
  5. Use annotated items in evaluations/experiments or export them for offline analysis.

Fields you’ll see

  • Queue: name, description, type (Traces or Conversations).
  • Question: text, helper, placeholder, type, options (for choice types), required, default, min/max (numeric).
  • Answers: captured per question in the matching value type (string, boolean, number, array for multi-choice).

Good practices

  • Ask one clear intent per question (e.g., “Is the answer factually correct?”).
  • Prefer choice types for consistency; use freeform for reviewer notes.
  • Keep queue scopes focused (per feature or per release) to avoid stale items.
  • Reuse queues to compare before/after changes to prompts or models.

  • Pair queues with Scores and Evaluations to track quality over time.
  • Use Conversations queues when you want full-session review; use Traces queues for single execution review.