LLM Merging Competition: Building LLMs Efficiently through Merging

The competition will provide the participants with a list of expert models that have already been trained on a task-specific dataset. All of these models will be publicly available on the Hugging Face Model Hub with licenses that permit their use for research purposes. These models can either be fully fine-tuned models or models obtained by parameter-efficient fine-tuning methods such as LoRA. Models on this list will be required to satisfy the following criteria: (1) model size $\leq 8$B parameters, and (2) model with licenses compatible with research use (e.b., MIT, Apache 2 etc). The goal of this competition is to re-use the provided models to create a generalist model that can perform well on a wide variety of skills like reasoning, coding, maths, chat, and tool use. This list of models will include popular pre-trained models such as LLaMA-7B, Mistral-7B, and Gemma-7B.

Along with these expert models, we also plan to provide a set of validation datasets to measure the time and space efficiency of the merging method. They are not meant to benchmark the performance of the merging method. The datasets will be released as part of the starter kit for the participants and are already hosted on the Hugging Face Hub with a permissive license. Apart from these, we will have two sets of hidden tasks that will be used to evaluate the submissions from participants: (1) a set of leaderboard ranking test tasks, and (2) a set of final ranking test tasks.
We will not collect or release any new datasets for training or evaluation as part of this competition.

Validation Datasets Lists:

CosmosQA
XSum

The main purpose of validation datasets is to measure the time and space efficiency. We are using a hidden list of test datasets to measure the performances.