OpenAI's framework for benchmarking LLMs and an open-source registry of evals. Industry-standard test harness.
README couldn't be fetched right now. View the full project on GitHub →