SurgeGlobal/Orca
TextsApache 2.0Introduced 2024-04-18
Dataset Generation
- Base Model: h2oai/h2ogpt-gm-oasst1-en-2048-falcon-40b-v2
- Seed Instructions: Derived from the FLAN-v2 Collection.
- Generation Approach: Explanation tuning with detailed responses generated from h2ogpt-gm-oasst1-en-2048-falcon-40b-v2.
- Total Instructions: 5,507 explanation tuning data samples.
Dataset Sources
- Repository: Bitbucket Project
- Paper : Pre-Print
Structure
The dataset entries consist of:
- Query
- Response
- System Message (when applicable)
Usage
The Orca Dataset is intended for fine-tuning language models to not only imitate the style but also the reasoning process of LFMs, thereby improving the safety and quality of the models’ responses.
Citation
If you find our work useful, please cite our paper as follows:
@misc{surge2024openbezoar,
title={OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data},
author={Chandeepa Dissanayake and Lahiru Lowe and Sachith Gunasekara and Yasiru Ratnayake},
year={2024},
eprint={2404.12195},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Dataset Authors
Chandeepa Dissanayake, Lahiru Lowe, Sachith Gunasekara, and Yasiru Ratnayake