SurgeGlobal/Evol-Instruct
TextsApache 2.0Introduced 2024-04-18
Dataset Generation
- Base Model: h2oai/h2ogpt-gm-oasst1-en-2048-falcon-40b-v2
- Seed Instructions: Selected from the databricks/databricks-dolly-15k dataset
- Generation Approach: Iterative evolution of instructions using a conversational syntax for in-depth and in-breadth evolving
- Total Instructions: 2,304 instruction tuning data samples
Dataset Sources
- Repository: Bitbucket Project
- Paper: Pre-Print
Structure
The dataset entries consist of:
- Instruction
- Response
- Evolution Strategy (in-depth or in-breadth)
- Category (of the original instruction)
Usage
The Evol-Instruct Dataset is designed for the automatic evolution of instruction datasets, enhancing the complexity and diversity of instructions to train language models for a wide range of tasks.
Citation
If you find our work useful, please cite our paper as follows:
@misc{surge2024openbezoar,
title={OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data},
author={Chandeepa Dissanayake and Lahiru Lowe and Sachith Gunasekara and Yasiru Ratnayake},
year={2024},
eprint={2404.12195},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Dataset Authors
Chandeepa Dissanayake, Lahiru Lowe, Sachith Gunasekara, and Yasiru Ratnayake