Leaderboard
Full leaderboard at Huggingface leaderboard
- Validation Benchmark Performance is averaged.
- Final performance will be assessed at the end of the competition on a hidden test set, which may or may not be correlated with Validation performance.
- Higher values are better.
Rank | š¤ Model / Submission Name | ā Validation Performance |
---|---|---|
1 | BVD_Mega | 54.5 |
2 | wcf_lar | 52.2 |
3 | readapt_median | 48.4 |
4 | kobeni | 46.7 |
5 | lore_route | 45.8 |
6 | shira_llama3_8b_it_algo0 | 44.9 |
7 | basic_merge_00 | 44.7 |
8 | llama_base_fc | 44.2 |
9 | llama_base_qa | 44.2 |
10 | shira_qw2_7b_it_algo0 | 44.1 |
11 | llama_merge2 | 42.7 |
12 | mistral_avg_exp_04 | 42.2 |
13 | shira_mtl7b_0_2_algo0 | 42.0 |
14 | mistral_avg_exp_05 | 41.4 |
15 | mistral_avg_exp_07 | 41.3 |
16 | mistral_avg_exp_06 | 41.2 |
17 | cdutr_AqQ3 | 41.1 |
18 | shira_ft5_algo0 | 40.8 |
19 | shira_ft5xl_algo0 | 40.8 |
20 | yi15_exp | 40.7 |
21 | yi15_exp | 39.9 |
22 | llama_avg | 38.5 |
23 | llama_avg (Baseline) | 38.4 |
24 | knovel_test | 38.4 |
25 | abc | 38.1 |
26 | flan_t5_avg | 38.0 |
27 | llama_optimized | 38.0 |
28 | Fbaseline | 38.0 |
29 | flan_t5_weights | 37.7 |
30 | flan_t5_avg_lora | 37.6 |
31 | cdutr_pi5c | 37.2 |
32 | my_t5_avg | 37.1 |
33 | deepseek_exp | 36.5 |
34 | shira_algo_k00 | 29.5 |
35 | SLM | 26.0 |
36 | llama_avg | 18.8 |
Updated on 08/19/2024. The full leaderboard is hosted on Huggingface leaderboard
How to submit your merging method
- Start from our starter code template LLM-Merging and build your own merging method.
- Please submit the whole repository. After modifying the files, tar the file into a tarball using the command:
tar -cvf llm-merging.tar LLM-Merging
-
Submit your tar file using this form
- Please submit a report describing your merging method to our OpenReview LMC 2024 page. Please follow the standard NeurIPS format template. There are no strict restrictions or limitations for the report, but we suggest that the page limit not exceed 4 pages. All submitted reports will be publicly accessible on our website.
Note:
- Each teamās submission will be evaluated at most once per day. Evaluation frequency will increase as the deadline approaches.
- An automatic submission method is comming soon.