CommitChronicle
TextsIntroduced 2023-08-15
CommitChronicle is a dataset for commit message generation (and/or completion).
Its key features:
- large-scale and multilingual: contains 10.7M commits from 11.9k GitHub repositories in 20 programming languages;
- diverse: avoids restrictive filtering on commit messages or commit diffs structure;
- suitable for experiments with commit history: provides metadata about commit authors and dates and uses split-by-project.
Available on 🤗 : JetBrains-Research/commit-chronicle