Google Ranked URLs Dataset

TextsIntroduced 2024-10-22

This dataset was curated for Search Engine Optimization (SEO) analysis tasks, including categorization and spam detection. It covers 12 diverse topics: basketball, books, cats, gardening, history, movies, music, recipes, sports, technology, travel, and weather. Some topics have hierarchical relationships, such as sports and basketball, while others are closely related (e.g., movies and music) or unrelated (e.g., basketball and gardening), with varying degrees of overlap among them. For each topic, approximately 300 search queries were generated using large language models (LLMs) like GPT, Llama, and Claude. The top 10 URLs from the Google Search Console’s search engine results page (SERP) were retrieved for each query.