TextOCR
ImagesTextsCC BY 4.0Introduced 2021-05-12
TextOCR is a dataset to benchmark text recognition on arbitrary shaped scene-text. TextOCR requires models to perform text-recognition on arbitrary shaped scene-text present on natural images. TextOCR provides ~1M high quality word annotations on TextVQA images allowing application of end-to-end reasoning on downstream tasks such as visual question answering or image captioning.
Dataset statistics:
- 28,134 natural images from TextVQA
- 903,069 annotated scene-text words
- 32 words per image on average