Papers With Code 2 | ML Benchmarks, SotA Results & Code

The LIMA dataset is a valuable resource used in natural language processing (NLP) research. Let me provide you with some details:

Origin and Purpose:
- The LIMA dataset is derived from the LLaMa language model, which has an impressive 65 billion parameters.
- It serves as a fine-tuned version of the LLaMa model, specifically adjusted using approximately 1,000 prompts and responses.
Performance and Applications:
- LIMA demonstrates remarkable performance by learning to follow specific response formats from just a handful of examples in the training data.
- The dataset covers a wide range of tasks, including complex queries such as planning trip itineraries and speculating about alternate history.
- Interestingly, the model tends to generalize well to unseen tasks that were not part of the training data.
License:
- The licensing of the LIMA dataset depends on the source data it was derived from:
  - If the source data has a stricter license than CC BY-NC-SA, the LIMA dataset follows the same restrictions.
  - Otherwise, it adheres to the CC BY-NC-SA license.

(1) GAIR/lima · Datasets at Hugging Face. https://huggingface.co/datasets/GAIR/lima. (2) GAIR/lima at main - Hugging Face. https://huggingface.co/datasets/GAIR/lima/tree/main. (3) 日本語LIMAデータセットlima-jaを作成したので公開します. https://zanote.net/ai/lima-ja/. (4) Paper page - LIMA: Less Is More for Alignment - Hugging Face. https://huggingface.co/papers/2305.11206. (5) undefined. https://huggingface.co/datasets/GAIR/lima/.