Datasets
Datasets are first-class collections of items you curate for evaluations and experiments. Items can come from traces or from any other source you import—but they don’t have to be tied to tracing.
What they are
- Saved sets of items with a table-like structure: a header (column names) and rows (values).
- Managed in the Datasets page (create, import, edit, delete).
- Used directly by Evaluations and Experiments; no extra wiring needed.
Why it matters
- Stable, repeatable inputs for scoring and A/B comparisons.
- Lets you rerun the same set after prompt/model changes to spot regressions.
- Can feed annotation queues when you need labeled ground truth on a known set.
How it works in Arcane
- Create a dataset (name, description; items can be anything you import or select, including from traces).
- Add items via import or the dataset builder (you can pull from trace searches, but it’s optional).
- (Optional) Send items to an annotation queue for labeling.
- Run Evaluations or Experiments against the dataset; reuse it for consistent baselines.
Fields you’ll see
- Name, description.
- Header: column names.
- Rows: values aligned to the header.
Related concepts
- Annotations & Queues: label traces and conversations and derive datasets for ground truth.
- Scores & Evaluations: run metrics against a dataset to track quality.
- Experiments: run prompt versions against a dataset; compare variants for fair wins/losses. See Experiments.