promptfoo
Open-source and commercial platform for testing prompts, agents, RAG systems, and AI security behavior.
Open official linkOverview
Category: LLM evaluation & red teaming
Open-source and commercial platform for testing prompts, agents, RAG systems, and AI security behavior.
Best for
AI teams that need repeatable evaluations, regression tests, and red-team probes before shipping LLM apps.
Use cases
- Test prompt changes
- Evaluate RAG quality
- Run AI red-team/security checks
Common example
Create an eval suite that compares model answers before and after changing a support chatbot prompt.
Pricing and free plan
Pricing model: Open-source Community version plus paid/enterprise plans; red-team probes and hosted features may have usage limits.
Free plan / trial assessment: Free/open-source usage exists, but larger-scale red teaming, hosted features, and enterprise controls are limited.
Limitations
Requires test-case design and engineering integration; eval results are only as good as the suite.
ChatGPT / Claude comparison
Better than ChatGPT/Claude for this task — it provides repeatable automated evals rather than one-off chat judgments.
Alternatives
LangSmith, Braintrust, OpenAI Evals, TruLens