ShopTC-100K
TextsMITIntroduced 2025-02-03
ShopTC-100K Dataset
The ShopTC-100K dataset is collected using TermMiner, an open-source data collection and topic modeling pipeline introduced in the paper:
If you find this dataset or the related paper useful for your research, please cite our paper:
@inproceedings{tsai2025harmful,
author = {Elisa Tsai and Neal Mangaokar and Boyuan Zheng and Haizhong Zheng and Atul Prakash},
title = {Harmful Terms and Where to Find Them: Measuring and Modeling Unfavorable Financial Terms and Conditions in Shopping Websites at Scale},
booktitle = {Proceedings of the ACM Web Conference 2025 (WWW ’25)},
year = {2025},
location = {Sydney, NSW, Australia},
publisher = {ACM},
address = {New York, NY, USA},
pages = {14},
month = {April 28-May 2},
doi = {10.1145/3696410.3714573}
}
Dataset Description
The dataset consists of sanitized terms extracted from e-commerce websites with English terms and conditions. The websites were sourced from the Tranco list (as of April 2024).