Goodhart’s Law and Large Language Models: When AI Gets Good at the Test Instead of the Subject
When a measure becomes a target, it ceases to be a good measure. LLMs are now the most expensive proof of this principle, scoring higher on benchmarks while gaming the metrics they were built to satisfy.
Artificial Intelligence Evergreen