Relari (YC W24)
Testing, evaluation, and synthetic data generation platform for AI agents.
Genel Bakış
Temel özellikler
- Synthetic dataset generation
- Automated agent evaluation pipelines
- Scenario and conversation simulation
- Customizable evaluation metrics
- Regression testing for LLM apps
- Performance benchmarking and reporting
Artılar ve eksiler
Artılar
- Purpose-built for evaluating multi-step AI agents
- Generates synthetic test data at scale
- Supports custom metrics and evaluators
- Backed by Y Combinator with active development
Eksiler
- Primarily aimed at technical teams, not non-developers
- Newer platform with an evolving feature set
- May require integration work to fit existing stacks
İncelemeler
6 puandan ortalama.
İnceleme bırakmak için giriş yap.
Fatima Zahra
Skeptical, then convinced
I went in skeptical — most tools in this space overpromise. It actually delivers on customizable evaluation metrics, and purpose-built for evaluating multi-step AI agents caught me off guard. still, I'd recommend giving it a real trial.
Robert Ainsworth
Solid for our team
We rolled this out across the team last quarter and supports custom metrics and evaluators. Customizable evaluation metrics fits neatly into how we already work, and customizable evaluation metrics removed a step we used to do by hand. Primarily aimed at technical teams, not non-developers, which is the main caveat, but it has held up under daily use.
Devin Walker
Skeptical, then convinced
I went in skeptical — most tools in this space overpromise. It actually delivers on performance benchmarking and reporting, and supports custom metrics and evaluators caught me off guard. Primarily aimed at technical teams, not non-developers is why this isn't a perfect score, still, I'd recommend giving it a real trial.
Carlos Mendoza
Compared a few options
Evaluated this against two competitors. Where it wins: scenario and conversation simulation and purpose-built for evaluating multi-step AI agents. Where it lags: may require integration work to fit existing stacks. On balance the feature set — especially scenario and conversation simulation — justifies the 5 stars for our use case.
Yuki Mori
Use it every day
Honestly didn't expect to like it this much. Performance benchmarking and reporting is exactly what I needed, and purpose-built for evaluating multi-step AI agents. I do wish may require integration work to fit existing stacks, but I reach for it almost every day now and it just clicks.
Leila Hassan
Skeptical, then convinced
I went in skeptical — most tools in this space overpromise. It actually delivers on regression testing for LLM apps, and supports custom metrics and evaluators caught me off guard. May require integration work to fit existing stacks is why this isn't a perfect score, still, I'd recommend giving it a real trial.
Sorular
Henüz soru yok — ilk soruyu sen sor.
Soru sor
Observability alternatifleri

AI2AI project
Observability
Watch two AI agents converse with each other in real time

Weave
Observability
A no-code AI workflow builder that enables businesses to automate operations by integrating multiple large language models (LLMs) and connecting prompts seam...

Temperstack
Observability
AI-driven reliability platform that automates monitoring, alerting, and incident management across observability stacks.

Arize AI
Observability
An AI observability and LLM evaluation platform that assists AI developers and data scientists in monitoring, troubleshooting, and enhancing the performance...

Inspeq AI
Observability
Enterprise platform for operationalizing Responsible AI in generative AI applications.

Future AGI
Observability
A platform enhancing AI accuracy through comprehensive evaluation and optimization tools.

FoundryAI
Observability
Build, evaluate, and improve AI agents for business automation

Helicone AI
Observability
All-in-one observability platform to monitor, debug, and improve production LLM apps.






