The Llama-3 evaluation is officially released, and the ranking is unexpected!

Latest Release1yrs ago (2024)release Lyan23

184 0 0

The Llama-3 evaluation is officially released, and the ranking is unexpected!

Just today, the LMSYS Chatbot Arena Leaderboard on HuggingFace, which is currently the more authoritative ranking list of large models, officially updated the data of the recently open-sourced Llama-3!

Because everyone’s attention to Llama has been very high recently, this evaluation is also very eye-catching.

In this leaderboard, Llama performed very well, with the 70B parameter version ranking 6th among large models!

Here are the specific scores. You can see that the 70B model parameters of Llama3 are very much a solution for the first echelon, and the 8B model is also close to the third echelon.

According to human preferences, it has always performed strongly compared to top models (see win-rate matrix). It has been optimized for dialogue scenarios with lots of instructive data post-training. With subject distribution and agreement studies, further analysis is still underway which we are looking forward to. We are also awaiting the details in the Llama-3 technical report.

Comparisons between each model:

What’s even more impressive is that under English conditions, the Llama-3 70B model can even rank first!

But in terms of Chinese performance, Llama-3 seems to be somewhat underpowered and didn’t even make the list:

Llama-3 made a slight effort and managed to rank fourth! This has many netizens looking forward to what experience the 400B super-large model can bring!

That’s the AI news roundup for today.