Architecture
This page describes the high-level architecture of the Arcane platform. Arcane connects on top of your existing trace infrastructure — it does not manage trace storage or ingest trace data.
Overview
Trace storage and trace data flows remain under your control. Arcane does not manage or ingest traces directly. Your application sends traces via OpenTelemetry to your own backend (Tempo, Jaeger, ClickHouse, or a custom API). Arcane connects on top — it queries your trace backend through configured datasources and stores only metadata (projects, evaluations, users) in its own database.
Your trace pipeline (separate from Arcane)
Your application emits traces to your own storage. Arcane is not part of this flow.
Arcane platform
The worker receives the necessary information (for scorers, prompts) in the job payload. If it needs more data, it queries the backend. The backend can invoke the worker via REST for on-demand prompts.
Arcane connects on top
Only the backend queries your trace storage via configured datasources. The worker never touches trace backends — it receives data in jobs or fetches from the backend when needed.
- Frontend — React-based UI for exploring traces, conversations, evaluations, and managing configuration
- Backend — NestJS API that orchestrates data flow, authentication, and business logic
- Worker — Background processing for evaluations, dataset operations, and async tasks
- Trace backends — Arcane connects to your existing storage: ClickHouse, Tempo, Jaeger, or a custom API
Trace pipeline vs Arcane
| Aspect | Trace pipeline | Arcane |
|---|---|---|
| Managed by | You | You (when self-hosted) |
| Trace ingestion | Your app → OpenTelemetry → your trace backend | None — Arcane never ingests traces |
| Trace storage | Your Tempo / Jaeger / ClickHouse / custom API | None — Arcane queries yours |
| Stored in Arcane | — | Metadata only: projects, evaluations, datasets, users |
| Connection | — | Arcane connects on top via datasource config |
Components
| Component | Role |
|---|---|
| Frontend | User interface, trace viewing, conversation replay, evaluation UI |
| Backend | REST API, auth, project/org management, datasource integration (queries your trace backend), invokes worker via REST for on-demand prompts |
| Worker | Batch evaluations, dataset operations; receives job data (scorers, prompts), queries backend when needed; invoked by backend via REST for on-demand prompts |
| Message broker | RabbitMQ or Kafka for job queues |
| Database | PostgreSQL for Arcane metadata only (users, projects, evaluations) |
| Trace storage | Your existing backend — only the Arcane backend queries it, never the worker |
Data Flow
- Traces (your pipeline) — Your application emits traces via OpenTelemetry to your trace backend. This flow is entirely separate from Arcane.
- Query (Arcane on top) — The Arcane backend queries your trace backend through a configured datasource. No trace data is copied into Arcane. The worker never queries trace backends.
- Metadata — Projects, evaluations, datasets, and users are stored in Arcane's PostgreSQL.
- Workers — Evaluation runs and dataset operations are queued via RabbitMQ/Kafka. The worker receives necessary data in the job payload; if it needs more, it queries the backend. The backend can invoke the worker via REST for on-demand prompts.
Why we don't ingest trace data
Arcane connects to your existing trace storage instead of ingesting and storing traces itself.
- You already have trace storage — Tempo, Jaeger, and ClickHouse are built for trace workloads. Duplicating that data into Arcane would add cost, latency, and operational burden without benefit.
- You control retention and compliance — Trace data stays in your infrastructure under your retention policies and data governance.
- Simpler deployment — No additional ingestion pipeline, no extra storage to size. Arcane plugs into what you already run.
- Works with your stack — Whether you use OpenTelemetry to Tempo, Jaeger, or a custom backend, Arcane reads from it. Your trace architecture stays yours.
Why RabbitMQ and Kafka
Arcane supports both RabbitMQ and Kafka as message brokers for job queues.
- Widely adopted — Both are mature, production-ready systems with broad ecosystem and tooling support.
- Deployment flexibility — Many teams already run RabbitMQ or Kafka. Arcane can use what you have instead of introducing another broker.
- RabbitMQ — Well suited for moderate throughput, simpler operations, and smaller deployments. Message acknowledgments and routing fit evaluation and dataset job patterns.
- Kafka — Better for high-volume, high-throughput workloads and teams with existing Kafka infrastructure. Replay and partitioning support scale-out workers.
- Choice per environment — You can run RabbitMQ for development and Kafka in production, or the opposite, depending on your setup.
Deployment
For deployment options, see Deployment.