Architecture

This page describes the high-level architecture of the Arcane platform. Arcane connects on top of your existing trace infrastructure — it does not manage trace storage or ingest trace data.

Overview

Trace storage and trace data flows remain under your control. Arcane does not manage or ingest traces directly. Your application sends traces via OpenTelemetry to your own backend (Tempo, Jaeger, ClickHouse, or a custom API). Arcane connects on top — it queries your trace backend through configured datasources and stores only metadata (projects, evaluations, users) in its own database.

Your trace pipeline (separate from Arcane)

Your application emits traces to your own storage. Arcane is not part of this flow.

Arcane platform

The worker receives the necessary information (for scorers, prompts) in the job payload. If it needs more data, it queries the backend. The backend can invoke the worker via REST for on-demand prompts.

Arcane connects on top

Only the backend queries your trace storage via configured datasources. The worker never touches trace backends — it receives data in jobs or fetches from the backend when needed.

Frontend — React-based UI for exploring traces, conversations, evaluations, and managing configuration
Backend — NestJS API that orchestrates data flow, authentication, and business logic
Worker — Background processing for evaluations, dataset operations, and async tasks
Trace backends — Arcane connects to your existing storage: ClickHouse, Tempo, Jaeger, or a custom API

Trace pipeline vs Arcane

Aspect	Trace pipeline	Arcane
Managed by	You	You (when self-hosted)
Trace ingestion	Your app → OpenTelemetry → your trace backend	None — Arcane never ingests traces
Trace storage	Your Tempo / Jaeger / ClickHouse / custom API	None — Arcane queries yours
Stored in Arcane	—	Metadata only: projects, evaluations, datasets, users
Connection	—	Arcane connects on top via datasource config

Components

Component	Role
Frontend	User interface, trace viewing, conversation replay, evaluation UI
Backend	REST API, auth, project/org management, datasource integration (queries your trace backend), invokes worker via REST for on-demand prompts
Worker	Batch evaluations, dataset operations; receives job data (scorers, prompts), queries backend when needed; invoked by backend via REST for on-demand prompts
Message broker	RabbitMQ or Kafka for job queues
Database	PostgreSQL for Arcane metadata only (users, projects, evaluations)
Trace storage	Your existing backend — only the Arcane backend queries it, never the worker

Data Flow

Traces (your pipeline) — Your application emits traces via OpenTelemetry to your trace backend. This flow is entirely separate from Arcane.
Query (Arcane on top) — The Arcane backend queries your trace backend through a configured datasource. No trace data is copied into Arcane. The worker never queries trace backends.
Metadata — Projects, evaluations, datasets, and users are stored in Arcane's PostgreSQL.
Workers — Evaluation runs and dataset operations are queued via RabbitMQ/Kafka. The worker receives necessary data in the job payload; if it needs more, it queries the backend. The backend can invoke the worker via REST for on-demand prompts.

Why we don't ingest trace data

Arcane connects to your existing trace storage instead of ingesting and storing traces itself.

You already have trace storage — Tempo, Jaeger, and ClickHouse are built for trace workloads. Duplicating that data into Arcane would add cost, latency, and operational burden without benefit.
You control retention and compliance — Trace data stays in your infrastructure under your retention policies and data governance.
Simpler deployment — No additional ingestion pipeline, no extra storage to size. Arcane plugs into what you already run.
Works with your stack — Whether you use OpenTelemetry to Tempo, Jaeger, or a custom backend, Arcane reads from it. Your trace architecture stays yours.

Why RabbitMQ and Kafka

Arcane supports both RabbitMQ and Kafka as message brokers for job queues.

Widely adopted — Both are mature, production-ready systems with broad ecosystem and tooling support.
Deployment flexibility — Many teams already run RabbitMQ or Kafka. Arcane can use what you have instead of introducing another broker.
RabbitMQ — Well suited for moderate throughput, simpler operations, and smaller deployments. Message acknowledgments and routing fit evaluation and dataset job patterns.
Kafka — Better for high-volume, high-throughput workloads and teams with existing Kafka infrastructure. Replay and partitioning support scale-out workers.
Choice per environment — You can run RabbitMQ for development and Kafka in production, or the opposite, depending on your setup.

Deployment

For deployment options, see Deployment.

Overview​

Your trace pipeline (separate from Arcane)​

Arcane platform​

Arcane connects on top​

Trace pipeline vs Arcane​

Components​

Data Flow​

Why we don't ingest trace data​

Why RabbitMQ and Kafka​

Deployment​