Burmese Handwritten Digit Dataset (BHDD)
The Burmese Handwritten Digit Dataset (BHDD) is a dataset project specifically created for recognizing handwritten Burmese digits. It is a Burmese version of MNIST dataset with a training set of 60,000 examples, and a test set of 27,561 examples.
Overview
Dataset Statistics:
- Training Set: 60,000 samples
- Testing Set: 27,561 samples
- Number of Classes: 10 (Burmese digits 0–9)
Data Format:
- Train Image Shape:
(60000, 784) - Train Label Shape:
(60000, 10) - Test Image Shape:
(27561, 784) - Test Label Shape:
(27561, 10)
The dataset was collected from over 150 individuals of different ages (ranging from high school students to professionals in their 50s) and diverse occupations (including clerks, programmers, and others) to achieve a wide variety of handwriting styles. We then preprocessed to mirror the structure and functionality of MNIST.
Dataset Content
The dataset consists of:
- Train Images: 60,000 grayscale images of handwritten Burmese digits, flattened into a 1D array of size 784 (28x28 pixels).
- Train Labels: One-hot encoded labels corresponding to the digit class.
- Test Images: 27,561 grayscale images for testing purposes.
- Test Labels: One-hot encoded labels for testing data.
Contribution
We encourage the ML/DL community to contribute by:
- Creating digit recognizers.
- Benchmarking with different models and algorithms.
- Writing tutorials and sharing findings.
Citation
If you use the BHDD dataset in your work, please cite this repository:
@dataset{bhdd,
author = {Expa.AI Research Team},
title = {Burmese Handwritten Digit Dataset (BHDD)},
year = {2019},
url = {https://github.com/baseresearch/BHDD}
}
Acknowledgments
This dataset would not have been possible without:
- The efforts of the Expa.AI Research Team.
- Volunteers and interns from Taungoo Computer University who contributed handwriting samples.
- Highschool students from St.Augustine / B.E.H.S (2) Kamayut
- Friends and family members of Expa.AI Research Team.
- The community’s ongoing support and interest in ML/DL for the Burmese language.
License
This dataset is released under the LGPL-3.0 license. Please see the LICENSE file for more details.