SurgeGlobal/Evol-Instruct

TextsApache 2.0Introduced 2024-04-18

Dataset Generation

  • Base Model: h2oai/h2ogpt-gm-oasst1-en-2048-falcon-40b-v2
  • Seed Instructions: Selected from the databricks/databricks-dolly-15k dataset
  • Generation Approach: Iterative evolution of instructions using a conversational syntax for in-depth and in-breadth evolving
  • Total Instructions: 2,304 instruction tuning data samples

Dataset Sources

Structure

The dataset entries consist of:

  • Instruction
  • Response
  • Evolution Strategy (in-depth or in-breadth)
  • Category (of the original instruction)

Usage

The Evol-Instruct Dataset is designed for the automatic evolution of instruction datasets, enhancing the complexity and diversity of instructions to train language models for a wide range of tasks.

Citation

If you find our work useful, please cite our paper as follows:

@misc{surge2024openbezoar,
      title={OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data}, 
      author={Chandeepa Dissanayake and Lahiru Lowe and Sachith Gunasekara and Yasiru Ratnayake},
      year={2024},
      eprint={2404.12195},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Dataset Authors

Chandeepa Dissanayake, Lahiru Lowe, Sachith Gunasekara, and Yasiru Ratnayake