Everyone Is Judging AI by These Tests. But Experts Say They’re Close to Meaningless.

ModerateImprovement@sh.itjust.works · 4 months ago

Everyone Is Judging AI by These Tests. But Experts Say They’re Close to Meaningless.

exu@feditown.com · 4 months ago

There’s a reason why the open llm leaderboard was changed a while ago.
Basically, scores didn’t improve much anymore and many tests were contained in the training data.

See this blogpost for more info.

https://huggingface.co/spaces/open-llm-leaderboard/blog