MNIST-MIX: A Multi-language Handwritten Digit Recognition Dataset

Weiwei Jiang

2020-04-08Handwritten Digit Recognition imbalanced classification

Abstract

In this letter, we contribute a multi-language handwritten digit recognition dataset named MNIST-MIX, which is the largest dataset of the same type in terms of both languages and data samples. With the same data format with MNIST, MNIST-MIX can be seamlessly applied in existing studies for handwritten digit recognition. By introducing digits from 10 different languages, MNIST-MIX becomes a more challenging dataset and its imbalanced classification requires a better design of models. We also present the results of applying a LeNet model which is pre-trained on MNIST as the baseline.

Related Papers

LSH-DynED: A Dynamic Ensemble Framework with LSH-Based Undersampling for Evolving Multi-Class Imbalanced Classification2025-06-24 CopulaSMOTE: A Copula-Based Oversampling Approach for Imbalanced Classification in Diabetes Prediction2025-06-18 Asymptotic Normality of Infinite Centered Random Forests -Application to Imbalanced Classification2025-06-10 Devanagari Digit Recognition using Quantum Machine Learning2025-06-08 Compact and Efficient Neural Networks for Image Recognition Based on Learned 2D Separable Transform2025-05-10 Bridging Econometrics and AI: VaR Estimation via Reinforcement Learning and GARCH Models2025-04-23 Kernel-Based Enhanced Oversampling Method for Imbalanced Classification2025-04-12 An Adaptive Clustering Scheme for Client Selections in Communication-Efficient Federated Learning2025-04-11