Datasets

271 machine learning datasets

271 dataset results

IEEE-CIS Fraud Detection

Can you detect fraud from customer transactions? Imagine standing at the check-out counter at the grocery store with a long line behind you and the cashier not-so-quietly announces that your card has been declined. In this moment, you probably aren’t thinking about the data science that determined your fate.

0 papers0 benchmarksTabular

X-Wines (A Wine Dataset for Recommender Systems and Machine Learning)

X-Wines is a consistent wine dataset containing 100,646 instances and 21 million real evaluations carried out by users. Data were collected on the open Web in 2022 and pre-processed for wider free use. They refer to the scale 1–5 ratings carried out over a period of 10 years (2012–2021) for wines produced in 62 different countries.

0 papers0 benchmarksImages, Ranking, Tabular, Texts, Time series

SMDG (Standardized Multi-Channel Dataset for Glaucoma)

Standardized Multi-Channel Dataset for Glaucoma (SMDG-19) is a collection and standardization of 19 public datasets, comprised of full-fundus glaucoma images, associated image metadata like, optic disc segmentation, optic cup segmentation, blood vessel segmentation, and any provided per-instance text metadata like sex and age. This dataset is the largest public repository of fundus images with glaucoma.

0 papers0 benchmarksImages, Medical, Tabular

BlendedICU, the first harmonized, international intensive care dataset

Objective This study introduces the BlendedICU dataset, a massive dataset of international intensive care data. This dataset aims to facilitate generalizability studies of machine learning models, as well as statistical studies of clinical practices in the intensive care units.

0 papers0 benchmarksBiomedical, Medical, Tabular, Time series

AntM2C (Ant-Group Multi-Scenario Multi-Modal CTR dataset)

We release a large-scale Multi-Scenario Multi-Modal CTR dataset named AntM2C, built from real industrial data from Alipay. This dataset offers an impressive breadth and depth of information, covering CTR data from four diverse business scenarios, including advertisements, consumer coupons, mini-programs, and videos. Unlike existing datasets, AntM2C provides not only ID-based features but also five textual features and one image feature for both users and items, supporting more delicate multi-modal CTR prediction.

0 papers0 benchmarksImages, Tabular, Texts

FacesInThings

We introduce an annotated dataset of five thousand human labeled pareidolic face images, called ``Faces in Things''. Faces in Things is derived from the LAION-5B dataset and annotated for key face attributes and bounding boxes

0 papers0 benchmarksTabular, Time series

F1 Regulations, Safety, and Racing Performance

Description The F1 Regulations, Safety, and Racing Performance dataset provides an overview of key factors influenced by the evolving Fédération Internationale de l'Automobile (FIA) regulations from 1990 to 2023. This dataset includes metrics such as the number of teams, drivers, races, fatalities, car weight, DRS implementation, and overtakes. It tracks the introduction of new regulations each season, especially those impacting aerodynamics, making it an essential resource for analyzing the long-term effects of regulatory changes on safety, racing dynamics, and overall spectacle in Formula 1.

0 papers0 benchmarksTabular

DOTA2 Games (Dota2 Games Results)

Dota 2 is a popular computer game with two teams of 5 players. At the start of the game, each player chooses a unique hero with different strengths and weaknesses. Predict the winning team.

0 papers0 benchmarksTabular

Online Shoppers (Online Shoppers Purchasing Intention Dataset)

Of the 12,330 sessions in the dataset, 84.5% (10,422) were negative class samples that did not end with shopping, and the rest (1908) were positive class samples ending with shopping.

0 papers0 benchmarksTabular

English-Pashto Language Dataset (EPLD)

The English-Pashto Language Dataset (EPLD) is a comprehensive resource aimed to provide linguistic insights into the Pashto language. It contains the knowledge and study of Pashto language with the basics of communication like counting, alphabets, pronoun, basic sentences used in everyday life. Every data is translated from English to Pashto for better human understanding and clarity. The data is carefully proofread and verified by the native speakers and the language experts. Pashto language has multiple variations and accents depending on the geographical factors. This dataset explains and addresses the key differences of words and sounds of Pashto, which may sound similar or different from English on the basis of gender, tense of the statement, relationship of the speaker etc. This dataset is designed to support language learning, natural language processing (NLP) research and computational linguistic studies focusing on Pashto language.

0 papers0 benchmarksTabular, Texts

depression interview dataset (depression interview dataset with 1.6 million clinical trail data)

contain the clinical trial dataset

0 papers0 benchmarksTabular

PreviousPage 14 of 14