BalitaNLP

ImagesTextsIntroduced 2023-03-09

A Filipino multi-modal language dataset for text+visual tasks. Consists of 351,755 Filipino news articles gathered from Filipino news outlets.

Each entry contains:

  • body - Article text
  • title - Article title
  • website - Name of the news outlet
  • category - News category given by the news outlet
  • date - Date published
  • author - Article author
  • url - URL of the article
  • img_url - URL of the article image
  • img_path - Filename of the image in the dataset