Skip to main content

Architecture

This page describes the high-level architecture of the Arcane platform. Arcane connects on top of your existing trace infrastructure — it does not manage trace storage or ingest trace data.

Overview

Trace storage and trace data flows remain under your control. Arcane does not manage or ingest traces directly. Your application sends traces via OpenTelemetry to your own backend (Tempo, Jaeger, ClickHouse, or a custom API). Arcane connects on top — it queries your trace backend through configured datasources and stores only metadata (projects, evaluations, users) in its own database.

Your trace pipeline (separate from Arcane)

Your application emits traces to your own storage. Arcane is not part of this flow.

Arcane platform

The worker receives the necessary information (for scorers, prompts) in the job payload. If it needs more data, it queries the backend. The backend can invoke the worker via REST for on-demand prompts.

Arcane connects on top

Only the backend queries your trace storage via configured datasources. The worker never touches trace backends — it receives data in jobs or fetches from the backend when needed.

  • Frontend — React-based UI for exploring traces, conversations, evaluations, and managing configuration
  • Backend — NestJS API that orchestrates data flow, authentication, and business logic
  • Worker — Background processing for evaluations, dataset operations, and async tasks
  • Trace backends — Arcane connects to your existing storage: ClickHouse, Tempo, Jaeger, or a custom API

Trace pipeline vs Arcane

AspectTrace pipelineArcane
Managed byYouYou (when self-hosted)
Trace ingestionYour app → OpenTelemetry → your trace backendNone — Arcane never ingests traces
Trace storageYour Tempo / Jaeger / ClickHouse / custom APINone — Arcane queries yours
Stored in ArcaneMetadata only: projects, evaluations, datasets, users
ConnectionArcane connects on top via datasource config

Components

ComponentRole
FrontendUser interface, trace viewing, conversation replay, evaluation UI
BackendREST API, auth, project/org management, datasource integration (queries your trace backend), invokes worker via REST for on-demand prompts
WorkerBatch evaluations, dataset operations; receives job data (scorers, prompts), queries backend when needed; invoked by backend via REST for on-demand prompts
Message brokerRabbitMQ or Kafka for job queues
DatabasePostgreSQL for Arcane metadata only (users, projects, evaluations)
Trace storageYour existing backend — only the Arcane backend queries it, never the worker

Data Flow

  1. Traces (your pipeline) — Your application emits traces via OpenTelemetry to your trace backend. This flow is entirely separate from Arcane.
  2. Query (Arcane on top) — The Arcane backend queries your trace backend through a configured datasource. No trace data is copied into Arcane. The worker never queries trace backends.
  3. Metadata — Projects, evaluations, datasets, and users are stored in Arcane's PostgreSQL.
  4. Workers — Evaluation runs and dataset operations are queued via RabbitMQ/Kafka. The worker receives necessary data in the job payload; if it needs more, it queries the backend. The backend can invoke the worker via REST for on-demand prompts.

Why we don't ingest trace data

Arcane connects to your existing trace storage instead of ingesting and storing traces itself.

  • You already have trace storage — Tempo, Jaeger, and ClickHouse are built for trace workloads. Duplicating that data into Arcane would add cost, latency, and operational burden without benefit.
  • You control retention and compliance — Trace data stays in your infrastructure under your retention policies and data governance.
  • Simpler deployment — No additional ingestion pipeline, no extra storage to size. Arcane plugs into what you already run.
  • Works with your stack — Whether you use OpenTelemetry to Tempo, Jaeger, or a custom backend, Arcane reads from it. Your trace architecture stays yours.

Why RabbitMQ and Kafka

Arcane supports both RabbitMQ and Kafka as message brokers for job queues.

  • Widely adopted — Both are mature, production-ready systems with broad ecosystem and tooling support.
  • Deployment flexibility — Many teams already run RabbitMQ or Kafka. Arcane can use what you have instead of introducing another broker.
  • RabbitMQ — Well suited for moderate throughput, simpler operations, and smaller deployments. Message acknowledgments and routing fit evaluation and dataset job patterns.
  • Kafka — Better for high-volume, high-throughput workloads and teams with existing Kafka infrastructure. Replay and partitioning support scale-out workers.
  • Choice per environment — You can run RabbitMQ for development and Kafka in production, or the opposite, depending on your setup.

Deployment

For deployment options, see Deployment.