MGTAcademic

mitIntroduced 2024-12-23

This repository provides a cleaned dataset, which is intended to be used for text classification, language modeling, and AI-generated content detection tasks. The dataset covers various fields such as STEM, Social Sciences, and Humanities, and contains datasets from different categories, each of which has been processed and cleaned for easy use. Move to our codebase fro more information (github)