Papers With Code 2 | ML Benchmarks, SotA Results & Code

CiteSum

CiteSum is a large-scale scientific extreme summarization benchmark.

4 papers3 benchmarksTexts

FiNER-139 is comprised of 1.1M sentences annotated with eXtensive Business Reporting Language (XBRL) tags extracted from annual and quarterly reports of publicly-traded companies in the US. Unlike other entity extraction tasks, like named entity recognition (NER) or contract element extraction, which typically require identifying entities of a small set of common types (e.g., persons, organizations), FiNER-139 uses a much larger label set of 139 entity types. Another important difference from typical entity extraction is that FiNER focuses on numeric tokens, with the correct tag depending mostly on context, not the token itself.

4 papers0 benchmarksTexts

Jigsaw Toxic Comment Classification Dataset

You are provided with a large number of Wikipedia comments which have been labeled by human raters for toxic behavior. The types of toxicity are:

4 papers2 benchmarksTexts

SWAT A7 (Secure Water Treatment (SWaT))

11 days of continuous operation: 7 under normal operation and 4 days with attack scenarios: + Collected network traffic & all the values obtained from all the 51 sensors and actuators + Data labelled according to normal and abnormal behaviours + Attack Scenarios: Derived through the attack models developed by our research team. The attack model considers the intent space of a CPS as an attack model. 41 attacks were launched during the 4 days and are described in the PDF.

4 papers0 benchmarks

SWORD ('Scenes with occluded regions' dataset)

The new dataset contains around 1,500 train videos and 290 test videos, with 50 frames per video on average. The dataset was obtained after processing the manually captured video sequences of static real-life urban scenes. The main property of the dataset is the abundance of close objects and, consequently, the larger prevalence of occlusions. According to the introduced heuristic, the mean area of occluded image parts for SWORD is approximately five times larger than for RealEstate10k data (14% vs 3% respectively). This rationalizes the collection and usage of SWORD and explains that SWORD allows training more powerful models despite being of smaller size.

4 papers3 benchmarks3D, Images, Videos

$O_2$Perm (Oxygen Permeability)

The $O_2$Perm dataset is created from the Membrane Society of Australasia portal. It uses monomers as polymer graphs to predict the property of oxygen permeability. It has he limited size (595 polymers), which brings great challenges to the property prediction.

4 papers0 benchmarks

Chilean Waiting List

The Chilean Waiting List corpus comprises de-identified referrals from the waiting list in Chilean public hospitals. A subset of 10,000 referrals (including medical and dental notes) was manually annotated with ten entity types with clinical relevance, keeping 1,000 annotations for a future shared task. A trained medical doctor or dentist annotated these referrals and then, together with three other researchers, consolidated each of the annotations. The annotated corpus has more than 48% of entities embedded in other entities or containing another. This corpus can be a useful resource to build new models for Nested Named Entity Recognition (NER). This work constitutes the first annotated corpus using clinical narratives from Chile and one of the few in Spanish.

4 papers1 benchmarksTexts

SV-Ident (Survey Variable Identification)

SV-Ident comprises 4,248 sentences from social science publications in English and German. The data is the official data for the Shared Task: “Survey Variable Identification in Social Science Publications” (SV-Ident) 2022. Sentences are labeled with variables that are mentioned either explicitly or implicitly.

4 papers4 benchmarksTexts

CARLANE Benchmark

Unsupervised Domain Adaptation demonstrates great potential to mitigate domain shifts by transferring models from labeled source domains to unlabeled target domains. While Unsupervised Domain Adaptation has been applied to a wide variety of complex vision tasks, only few works focus on lane detection for autonomous driving. This can be attributed to the lack of publicly available datasets. To facilitate research in these directions, we propose CARLANE, a 3-way sim-to-real domain adaptation benchmark for 2D lane detection. CARLANE encompasses the single-target datasets MoLane and TuLane and the multi-target dataset MuLane. These datasets are built from three different domains, which cover diverse scenes and contain a total of 163K unique images, 118K of which are annotated. In addition we evaluate and report systematic baselines, including our own method, which builds upon Prototypical Cross-domain Self-supervised Learning. We find that false positive and false negative rates of the eva

4 papers0 benchmarksImages

1,995 People Face Images Data (Asian race)

Description: 1,995 People Face Images Data (Asian race). For each subject, more than 20 images per person with frontal face were collected. This data can be used for face recognition and other tasks.

4 papers0 benchmarksImages

LIT-PCBA(ALDH1) (ALDH1 target of LIT-PCBA Dataset)

Comparative evaluation of virtual screening methods requires a rigorous benchmarking procedure on diverse, realistic, and unbiased data sets. Recent investigations from numerous research groups unambiguously demonstrate that artificially constructed ligand sets classically used by the community (e.g., DUD, DUD-E, MUV) are unfortunately biased by both obvious and hidden chemical biases, therefore overestimating the true accuracy of virtual screening methods. We herewith present a novel data set (LIT-PCBA) specifically designed for virtual screening and machine learning. LIT-PCBA relies on 149 dose–response PubChem bioassays that were additionally processed to remove false positives and assay artifacts and keep active and inactive compounds within similar molecular property ranges. To ascertain that the data set is suited to both ligand-based and structure-based virtual screening, target sets were restricted to single protein targets for which at least one X-ray structure is available in

4 papers1 benchmarks

LIT-PCBA(KAT2A) (KAT2A target of LIT-PCBA Dataset)

Comparative evaluation of virtual screening methods requires a rigorous benchmarking procedure on diverse, realistic, and unbiased data sets. Recent investigations from numerous research groups unambiguously demonstrate that artificially constructed ligand sets classically used by the community (e.g., DUD, DUD-E, MUV) are unfortunately biased by both obvious and hidden chemical biases, therefore overestimating the true accuracy of virtual screening methods. We herewith present a novel data set (LIT-PCBA) specifically designed for virtual screening and machine learning. LIT-PCBA relies on 149 dose–response PubChem bioassays that were additionally processed to remove false positives and assay artifacts and keep active and inactive compounds within similar molecular property ranges. To ascertain that the data set is suited to both ligand-based and structure-based virtual screening, target sets were restricted to single protein targets for which at least one X-ray structure is available in

4 papers1 benchmarks

LIT-PCBA(MAPK1) (MAPK1 target of LIT-PCBA Dataset)

Comparative evaluation of virtual screening methods requires a rigorous benchmarking procedure on diverse, realistic, and unbiased data sets. Recent investigations from numerous research groups unambiguously demonstrate that artificially constructed ligand sets classically used by the community (e.g., DUD, DUD-E, MUV) are unfortunately biased by both obvious and hidden chemical biases, therefore overestimating the true accuracy of virtual screening methods. We herewith present a novel data set (LIT-PCBA) specifically designed for virtual screening and machine learning. LIT-PCBA relies on 149 dose–response PubChem bioassays that were additionally processed to remove false positives and assay artifacts and keep active and inactive compounds within similar molecular property ranges. To ascertain that the data set is suited to both ligand-based and structure-based virtual screening, target sets were restricted to single protein targets for which at least one X-ray structure is available in

4 papers1 benchmarks

Lipophilicity (logd74)

The lipophilicity database refers to a collection of information related to the lipophilic properties of various molecules. Lipophilicity, also known as hydrophobicity, is a measure of how readily a substance dissolves in nonpolar solvents (such as oil) compared to polar solvents (such as water). In the context of drug discovery and pharmacology, understanding the lipophilicity of compounds is crucial because it affects their absorption, distribution, metabolism, and excretion (ADME) within the body.

4 papers1 benchmarks

VizNet-Sato

VizNet-Sato is a dataset from the authors of Sato and is based on the VizNet dataset. The authors choose from VizNet only relational web tables with headers matching their selected 78 DBpedia semantic types. The selected tables are divided into two categories: Full tables and Multi-column only tables. The first category corresponds to 78,733 selected tables from VizNet, while the second category includes 32,265 tables which have more than one column. The tables of both categories are divided into 5 subsets to be able to conduct 5-fold cross validation: 4 subsets are used for training and the last for evaluation.

4 papers0 benchmarksTabular

WikipediaGS

The WikipediaGS dataset was created by extracting Wikipedia tables from Wikipedia pages. It consists of 485,096 tables which were annotated with DBpedia entities for the Cell Entity Annotation (CEA) task.

4 papers2 benchmarksTabular

ARAUS (Affective Responses to Augmented Urban Soundscapes)

Choosing optimal maskers for existing soundscapes to effect a desired perceptual change via soundscape augmentation is non-trivial due to extensive varieties of maskers and a dearth of benchmark datasets with which to compare and develop soundscape augmentation models. To address this problem, we make publicly available the ARAUS (Affective Responses to Augmented Urban Soundscapes) dataset, which comprises a five-fold cross-validation set and independent test set totaling 25,440 unique subjective perceptual responses to augmented soundscapes presented as audio-visual stimuli. Each augmented soundscape is made by digitally adding "maskers" (bird, water, wind, traffic, construction, or silence) to urban soundscape recordings at fixed soundscape-to-masker ratios. Responses were then collected by asking participants to rate how pleasant, annoying, eventful, uneventful, vibrant, monotonous, chaotic, calm, and appropriate each augmented soundscape was, in accordance with ISO 12913-2:2018. Pa

4 papers0 benchmarksAudio, Tabular, Videos

FewSOL (A Dataset for Few-Shot Object Learning in Robotic Environments)

The Few-Shot Object Learning (FewSOL) dataset can be used for object recognition with a few images per object. It contains 336 real-world objects with 9 RGB-D images per object from different views. Object segmentation masks, object poses and object attributes are provided. In addition, synthetic images generated using 330 3D object models are used to augment the dataset. FewSOL dataset can be used to study a set of few-shot object recognition problems such as classification, detection and segmentation, shape reconstruction, pose estimation, keypoint correspondences and attribute recognition.

4 papers0 benchmarks6D, Images, RGB-D, Texts

LineCap

LineCap is a dataset of line charts scraped from scientific papers each accompanied with crowd-sourced captions describing the trends of individual lines in the figure and the figure as a whole.

4 papers0 benchmarks

Oracle-MNIST (Oracle-MNIST: a Realistic Image Dataset for Benchmarking Machine Learning Algorithms)

We introduce the Oracle-MNIST dataset, comprising of 2828 grayscale images of 30,222 ancient characters from 10 categories, for benchmarking pattern classification, with particular challenges on image noise and distortion. The training set totally consists of 27,222 images, and the test set contains 300 images per class. Oracle-MNIST shares the same data format with the original MNIST dataset, allowing for direct compatibility with all existing classifiers and systems, but it constitutes a more challenging classification task than MNIST. The images of ancient characters suffer from 1) extremely serious and unique noises caused by three-thousand years of burial and aging and 2) dramatically variant writing styles by ancient Chinese, which all make them realistic for machine learning research. The dataset is freely available at https://github.com/wm-bupt/oracle-mnist.

4 papers2 benchmarksImages

Datasets

CiteSum

FiNER-139

Jigsaw Toxic Comment Classification Dataset

SWAT A7 (Secure Water Treatment (SWaT))

SWORD ('Scenes with occluded regions' dataset)

$O_2$Perm (Oxygen Permeability)

Chilean Waiting List

SV-Ident (Survey Variable Identification)

CARLANE Benchmark

1,995 People Face Images Data (Asian race)

LIT-PCBA(ALDH1) (ALDH1 target of LIT-PCBA Dataset)

LIT-PCBA(KAT2A) (KAT2A target of LIT-PCBA Dataset)

LIT-PCBA(MAPK1) (MAPK1 target of LIT-PCBA Dataset)

Lipophilicity (logd74)

VizNet-Sato

WikipediaGS

ARAUS (Affective Responses to Augmented Urban Soundscapes)

FewSOL (A Dataset for Few-Shot Object Learning in Robotic Environments)

LineCap

Oracle-MNIST (Oracle-MNIST: a Realistic Image Dataset for Benchmarking Machine Learning Algorithms)

Datasets

CiteSum

FiNER-139

Jigsaw Toxic Comment Classification Dataset

SWAT A7 (Secure Water Treatment (SWaT))

SWORD ('Scenes with occluded regions' dataset)

$O_2$Perm (Oxygen Permeability)

Chilean Waiting List

SV-Ident (Survey Variable Identification)

CARLANE Benchmark

1,995 People Face Images Data (Asian race)

LIT-PCBA(ALDH1) (ALDH1 target of LIT-PCBA Dataset)

LIT-PCBA(KAT2A) (KAT2A target of LIT-PCBA Dataset)

LIT-PCBA(MAPK1) (MAPK1 target of LIT-PCBA Dataset)

Lipophilicity (logd74)

VizNet-Sato

WikipediaGS

ARAUS (Affective Responses to Augmented Urban Soundscapes)

FewSOL (A Dataset for Few-Shot Object Learning in Robotic Environments)

LineCap

Oracle-MNIST (Oracle-MNIST: a Realistic Image Dataset for Benchmarking Machine Learning Algorithms)