🐢 Open-Source Evaluation & Testing for AI & LLM systems
-
Updated
Aug 1, 2025 - Python
🐢 Open-Source Evaluation & Testing for AI & LLM systems
Next-generation AI Agent Optimization Platform: Cozeloop addresses challenges in AI agent development by providing full-lifecycle management capabilities from development, debugging, and evaluation to monitoring.
Evaluation and Tracking for LLM Experiments and AI Agents
A single interface to use and evaluate different agent frameworks
Ranking LLMs on agentic tasks
Tune your AI Agent to best meet its KPI with a cyclic process of analyze, improve and simulate
Mathematical benchmark exposing the massive performance gap between real agents and LLM wrappers. Rigorous multi-dimensional evaluation: stress testing, network resilience, ensemble coordination, failure analysis. Features statistical validation and reproducible methodology for separating architectural theater from real systems.
🤖 A curated list of resources for testing AI agents - frameworks, methodologies, benchmarks, tools, and best practices for ensuring reliable, safe, and effective autonomous AI systems
Intelligent Context Engineering Assistant for Multi-Agent Systems. Analyze, optimize, and enhance your AI agent configurations with AI-powered insights
Train a reinforcement learning agent using PPO to balance a pole on a cart in the CartPole-v0 environment using Gymnasium and Stable-Baselines3. Includes model training, evaluation, and rendering using Python and Jupyter Notebook.
Visual dashboard to evaluate multi-agent & RAG-based AI apps. Compare models on accuracy, latency, token usage, and trust metrics - powered by NVIDIA AgentIQ
Browser automation agent for Bunnings website using the browser-use library, orchestrated via the laminar framework, managed with uv for Python environments, and running in Brave Browser for stealth and CAPTCHA bypass.
Add a description, image, and links to the agent-evaluation topic page so that developers can more easily learn about it.
To associate your repository with the agent-evaluation topic, visit your repo's landing page and select "manage topics."