OpenAI Evals

OpenAI's framework for benchmarking LLMs and an open-source registry of evals. Industry-standard test harness.

⭐ 16,000 stars🍴 0 forksPythonbenchmarkevaluationopenai

RecommendedTry in playground →Run in your browser without cloning.Source codeDownload ZIPLatest commit on default branch.SourceView on GitHubgithub.com/openai/evals

README

README couldn't be fetched right now. View the full project on GitHub →