Skip to main content

Experiments

Experiments run a prompt version against a dataset to test performance. You create an experiment, run it, and view results per dataset row. See Prompts, Models & Scores and Datasets for the underlying concepts, and Evaluation & Experimentation for how experiments relate to evaluations.


Prerequisites

  • Prompts — Create and version a prompt with a model configuration (AI Models tab). See Prompts.
  • Datasets — Create a dataset with columns that match your prompt inputs. See Datasets. You can build datasets from traces using Dataset Mode.

Experiments list

From Experiments in the sidebar, you see the Experiments Management page: all experiments for the project.

Experiments list

Search and sort

ControlWhat it does
SearchFilter experiments by name, description, prompt version ID, or dataset ID.
SortSort by Name, Description, Prompt Version, Dataset, or Created Date (ascending/descending).

Experiment cards

Each card shows:

  • Name and Description — or, if no description, truncated prompt version ID, dataset ID, and results count
  • Details — open the experiment detail page
  • Re-run — re-execute the experiment against the dataset
  • Delete — remove the experiment and all its results

New experiment

Click New experiment to create an experiment with name, prompt version, dataset, and input mappings.


Create experiment

Create New Experiment

Basic info

FieldWhat it does
NameRequired. Label for the experiment.
DescriptionOptional. Helper text for your team.

Prompt and dataset

FieldWhat it does
PromptSelect a prompt, then choose a version (e.g. v0, v1). Required.
Prompt VersionAppears after selecting a prompt. Pick the version to run.
DatasetSelect the dataset to run against. Required.

Prompt input mappings

When your prompt uses variables (e.g. Mustache {{query}} or F-String {query}), the form shows Prompt Input Mappings. Map each variable to a dataset column:

MappingWhat it does
Variable → Dataset fieldMap each prompt input to a dataset column. Select a column or "Other" for a custom value.

Variables are detected from the prompt template. See Prompts — Template format for Mustache and F-String syntax. Map them so each row supplies the correct inputs when the experiment runs.

Click Create Experiment to save. The experiment runs automatically against the dataset.


Experiment detail

When you open an experiment, you see two tabs: Details and Results.

Experiment detail

  • Back to Experiments — return to the list
  • Re-run — re-execute the experiment
  • Delete — remove the experiment and all results

Details tab

Shows the experiment configuration:

  • Name and Description
  • Prompt — link to the prompt and version (e.g. RAG Prompt • Version: v1)
  • Dataset — link to the dataset
  • Prompt Input Mappings — variable → dataset column mappings (e.g. query → query, retrieved_contexts → retrieved_contexts)
  • Created and Updated timestamps

Results tab

Experiment results

  • Search results — filter rows
  • Table — dataset columns (query, retrieved_contexts, reference, etc.) plus Experiment Result (the model output per row)
  • Copy to clipboard — copy cell values
  • Pagination — navigate through result rows

Each row shows the dataset input values and the generated output for that row.


Re-run experiment

Use Re-run to execute the experiment again against the same dataset. Useful when:

  • The prompt or dataset has changed
  • You want to refresh results after model/config updates

Re-running creates a new experiment (the original is unchanged). The operation may take some time for large datasets.


When to use

  • Test a prompt on a curated dataset before deploying.
  • Compare prompt versions — create separate experiments for different versions on the same dataset.
  • Batch runs — run a prompt over many inputs and inspect outputs per row.
  • Feed evaluations — use experiment outputs in Evaluations to score quality with Scores.