271 machine learning datasets
271 dataset results
Can you detect fraud from customer transactions? Imagine standing at the check-out counter at the grocery store with a long line behind you and the cashier not-so-quietly announces that your card has been declined. In this moment, you probably aren’t thinking about the data science that determined your fate.
X-Wines is a consistent wine dataset containing 100,646 instances and 21 million real evaluations carried out by users. Data were collected on the open Web in 2022 and pre-processed for wider free use. They refer to the scale 1–5 ratings carried out over a period of 10 years (2012–2021) for wines produced in 62 different countries.
Standardized Multi-Channel Dataset for Glaucoma (SMDG-19) is a collection and standardization of 19 public datasets, comprised of full-fundus glaucoma images, associated image metadata like, optic disc segmentation, optic cup segmentation, blood vessel segmentation, and any provided per-instance text metadata like sex and age. This dataset is the largest public repository of fundus images with glaucoma.
Objective This study introduces the BlendedICU dataset, a massive dataset of international intensive care data. This dataset aims to facilitate generalizability studies of machine learning models, as well as statistical studies of clinical practices in the intensive care units.
We release a large-scale Multi-Scenario Multi-Modal CTR dataset named AntM2C, built from real industrial data from Alipay. This dataset offers an impressive breadth and depth of information, covering CTR data from four diverse business scenarios, including advertisements, consumer coupons, mini-programs, and videos. Unlike existing datasets, AntM2C provides not only ID-based features but also five textual features and one image feature for both users and items, supporting more delicate multi-modal CTR prediction.
We introduce an annotated dataset of five thousand human labeled pareidolic face images, called ``Faces in Things''. Faces in Things is derived from the LAION-5B dataset and annotated for key face attributes and bounding boxes
Description The F1 Regulations, Safety, and Racing Performance dataset provides an overview of key factors influenced by the evolving Fédération Internationale de l'Automobile (FIA) regulations from 1990 to 2023. This dataset includes metrics such as the number of teams, drivers, races, fatalities, car weight, DRS implementation, and overtakes. It tracks the introduction of new regulations each season, especially those impacting aerodynamics, making it an essential resource for analyzing the long-term effects of regulatory changes on safety, racing dynamics, and overall spectacle in Formula 1.
Dota 2 is a popular computer game with two teams of 5 players. At the start of the game, each player chooses a unique hero with different strengths and weaknesses. Predict the winning team.
Of the 12,330 sessions in the dataset, 84.5% (10,422) were negative class samples that did not end with shopping, and the rest (1908) were positive class samples ending with shopping.
The English-Pashto Language Dataset (EPLD) is a comprehensive resource aimed to provide linguistic insights into the Pashto language. It contains the knowledge and study of Pashto language with the basics of communication like counting, alphabets, pronoun, basic sentences used in everyday life. Every data is translated from English to Pashto for better human understanding and clarity. The data is carefully proofread and verified by the native speakers and the language experts. Pashto language has multiple variations and accents depending on the geographical factors. This dataset explains and addresses the key differences of words and sounds of Pashto, which may sound similar or different from English on the basis of gender, tense of the statement, relationship of the speaker etc. This dataset is designed to support language learning, natural language processing (NLP) research and computational linguistic studies focusing on Pashto language.
contain the clinical trial dataset