CC-News

CommonCrawl News dataset

TextsIntroduced 2016-10-04

CommonCrawl News is a dataset containing news articles from news sites all over the world. The dataset is available in form of Web ARChive (WARC) files that are released on a daily basis.

Source: https://commoncrawl.org/2016/10/news-dataset-available/