Papers With Code 2 | ML Benchmarks, SotA Results & Code

TextOCR is a dataset to benchmark text recognition on arbitrary shaped scene-text. TextOCR requires models to perform text-recognition on arbitrary shaped scene-text present on natural images. TextOCR provides ~1M high quality word annotations on TextVQA images allowing application of end-to-end reasoning on downstream tasks such as visual question answering or image captioning.

Dataset statistics:

28,134 natural images from TextVQA
903,069 annotated scene-text words
32 words per image on average