ReviewBenchLite: A Benchmark for Evaluating Automated Code Review Capabilities of Language Models
We introduce ReviewBenchLite, a benchmark for systematically evaluating the code review capabilities of language models and autonomous agents. Unlike existing benchmarks that focus on code generation or bug fixing given explicit problem descriptions, ReviewBenchLite tests the ability to proactively identify issues in production codebases without prior knowledge of what problems exist.