CodeSCAN
ScreenCast ANalysis for Video Programming Tutorials
ImagesTextsOther (Non-Commercial)
CodeSCAN is the first large-scale and diverse dataset of coding screenshots with pixel-perfect annotations. It features:
- 24 popular programming languages (according to Github)
- 100 random repositories per language (with MIT, BSD-3 or WTFPL License), i.e. 2.400 repositories in total
- Per repository we use 5 files, i.e. 12.000 files in total
- ~100 different themes and 25 different fonts
- Diverse layouts changes, such as menu bar visibility, sidebar position, output window content, etc.
- Numerous realistic interactions such as searching, typing and selecting within a file, etc.
Check our project page (https://a-nau.github.io/codescan/) for details.