Relari (YC W24)

Testing, evaluation, and synthetic data generation platform for AI agents.

4.3 (6)

Évalué par Daniel Nikulshyn·Mis à jour mai 2026

Aperçu

Relari is a developer platform focused on improving the reliability of AI agents through systematic testing and evaluation. It helps teams generate synthetic datasets, run automated evaluations, and benchmark agent performance across realistic scenarios before shipping to production. Backed by Y Combinator (W24), Relari targets engineering teams building complex LLM applications and multi-step agents where traditional QA falls short. Its tooling aims to bring software-engineering rigor—unit tests, regression checks, and measurable metrics—to non-deterministic AI systems. The platform supports custom evaluators, scenario simulation, and continuous monitoring, making it useful for both pre-launch validation and ongoing quality assurance of production agents.

Fonctionnalités clés

Synthetic dataset generation
Automated agent evaluation pipelines
Scenario and conversation simulation
Customizable evaluation metrics
Regression testing for LLM apps
Performance benchmarking and reporting

Pour & contre

Pour

Purpose-built for evaluating multi-step AI agents
Generates synthetic test data at scale
Supports custom metrics and evaluators
Backed by Y Combinator with active development

Contre

Primarily aimed at technical teams, not non-developers
Newer platform with an evolving feature set
May require integration work to fit existing stacks

Avis

4.3

Moyenne sur 6 avis.

Connecte-toi pour laisser un avis.

Fatima Zahra

Skeptical, then convinced

I went in skeptical — most tools in this space overpromise. It actually delivers on customizable evaluation metrics, and purpose-built for evaluating multi-step AI agents caught me off guard. still, I'd recommend giving it a real trial.

Robert Ainsworth

Solid for our team

We rolled this out across the team last quarter and supports custom metrics and evaluators. Customizable evaluation metrics fits neatly into how we already work, and customizable evaluation metrics removed a step we used to do by hand. Primarily aimed at technical teams, not non-developers, which is the main caveat, but it has held up under daily use.

Devin Walker

Skeptical, then convinced

I went in skeptical — most tools in this space overpromise. It actually delivers on performance benchmarking and reporting, and supports custom metrics and evaluators caught me off guard. Primarily aimed at technical teams, not non-developers is why this isn't a perfect score, still, I'd recommend giving it a real trial.

Carlos Mendoza

Compared a few options

Evaluated this against two competitors. Where it wins: scenario and conversation simulation and purpose-built for evaluating multi-step AI agents. Where it lags: may require integration work to fit existing stacks. On balance the feature set — especially scenario and conversation simulation — justifies the 5 stars for our use case.

Yuki Mori

Use it every day

Honestly didn't expect to like it this much. Performance benchmarking and reporting is exactly what I needed, and purpose-built for evaluating multi-step AI agents. I do wish may require integration work to fit existing stacks, but I reach for it almost every day now and it just clicks.

Leila Hassan

Skeptical, then convinced

I went in skeptical — most tools in this space overpromise. It actually delivers on regression testing for LLM apps, and supports custom metrics and evaluators caught me off guard. May require integration work to fit existing stacks is why this isn't a perfect score, still, I'd recommend giving it a real trial.

Questions & réponses

Pas encore de question — sois le premier à demander.

Poser une question

Alternatives à Observability

AI2AI project

Observability

Watch two AI agents converse with each other in real time

4.5 (4)

Free

Weave

Observability

A no-code AI workflow builder that enables businesses to automate operations by integrating multiple large language models (LLMs) and connecting prompts seam...

4.8 (5)

Free

Temperstack

Observability

AI-driven reliability platform that automates monitoring, alerting, and incident management across observability stacks.

4.3 (4)

Free

Arize AI

Observability

An AI observability and LLM evaluation platform that assists AI developers and data scientists in monitoring, troubleshooting, and enhancing the performance...

4.3 (6)

Freemium