The Llama-3 evaluation is officially released, and the ranking is unexpected!

Latest Release7个月前发布 Lyan23
55 0 0
The Llama-3 evaluation is officially released, and the ranking is unexpected!

Just today, the LMSYS Chatbot Arena Leaderboard on HuggingFace, which is currently the more authoritative ranking list of large models, officially updated the data of the recently open-sourced Llama-3!

The Llama-3 evaluation is officially released, and the ranking is unexpected!
Because everyone’s attention to Llama has been very high recently, this evaluation is also very eye-catching.

The Llama-3 evaluation is officially released, and the ranking is unexpected!
In this leaderboard, Llama performed very well, with the 70B parameter version ranking 6th among large models!

The Llama-3 evaluation is officially released, and the ranking is unexpected!
Here are the specific scores. You can see that the 70B model parameters of Llama3 are very much a solution for the first echelon, and the 8B model is also close to the third echelon.

The Llama-3 evaluation is officially released, and the ranking is unexpected!
According to human preferences, it has always performed strongly compared to top models (see win-rate matrix). It has been optimized for dialogue scenarios with lots of instructive data post-training. With subject distribution and agreement studies, further analysis is still underway which we are looking forward to. We are also awaiting the details in the Llama-3 technical report.

Comparisons between each model:

The Llama-3 evaluation is officially released, and the ranking is unexpected!
What’s even more impressive is that under English conditions, the Llama-3 70B model can even rank first!

The Llama-3 evaluation is officially released, and the ranking is unexpected!
But in terms of Chinese performance, Llama-3 seems to be somewhat underpowered and didn’t even make the list:

The Llama-3 evaluation is officially released, and the ranking is unexpected!
Llama-3 made a slight effort and managed to rank fourth! This has many netizens looking forward to what experience the 400B super-large model can bring!

The Llama-3 evaluation is officially released, and the ranking is unexpected!
That’s the AI news roundup for today.

 

© 版权声明

相关文章