CodeSCAN

ScreenCast ANalysis for Video Programming Tutorials

ImagesTextsOther (Non-Commercial)

CodeSCAN is the first large-scale and diverse dataset of coding screenshots with pixel-perfect annotations. It features:

  • 24 popular programming languages (according to Github)
  • 100 random repositories per language (with MIT, BSD-3 or WTFPL License), i.e. 2.400 repositories in total
  • Per repository we use 5 files, i.e. 12.000 files in total
  • ~100 different themes and 25 different fonts
  • Diverse layouts changes, such as menu bar visibility, sidebar position, output window content, etc.
  • Numerous realistic interactions such as searching, typing and selecting within a file, etc.

Check our project page (https://a-nau.github.io/codescan/) for details.