ModerateImprovement@sh.itjust.works to Technology@lemmy.worldEnglish · 4 months agoEveryone Is Judging AI by These Tests. But Experts Say They’re Close to Meaningless.themarkup.orgexternal-linkmessage-square26fedilinkarrow-up1132arrow-down17
arrow-up1125arrow-down1external-linkEveryone Is Judging AI by These Tests. But Experts Say They’re Close to Meaningless.themarkup.orgModerateImprovement@sh.itjust.works to Technology@lemmy.worldEnglish · 4 months agomessage-square26fedilink
minus-squareexu@feditown.comlinkfedilinkEnglisharrow-up21·4 months agoThere’s a reason why the open llm leaderboard was changed a while ago. Basically, scores didn’t improve much anymore and many tests were contained in the training data. See this blogpost for more info. https://huggingface.co/spaces/open-llm-leaderboard/blog
There’s a reason why the open llm leaderboard was changed a while ago.
Basically, scores didn’t improve much anymore and many tests were contained in the training data.
See this blogpost for more info.
https://huggingface.co/spaces/open-llm-leaderboard/blog