Datasets

71 machine learning datasets

71 dataset results

BIRDeep (BIRDeep_AudioAnnotations)

The BIRDeep Audio Annotations dataset is a collection of bird vocalizations from Doñana National Park, Spain. It was created as part of the BIRDeep project, which aims to optimize the detection and classification of bird species in audio recordings using deep learning techniques. The dataset is intended for use in training and evaluating models for bird vocalization detection and identification.

1 papers0 benchmarksAudio, Biology, Environment, Images

MERGE SPCS

This dataset contains pre-processed versions of datasets introduced in prior works. Additionally, it also contains new data that are pertinent to the paper.

1 papers0 benchmarksBiology, Biomedical, Images, Medical, Tables, Tabular

((Easy resolve issue~guide))How do I resolve a dispute with Expedia?

How to Resolve a Dispute on 𝓔𝔁𝓹𝓮𝓭𝓲𝓪 ?

1 papers0 benchmarksBiology

HeartSeg

The medaka (Oryzias latipes) and the zebrafish (Danio rerio) are used as a model organism for a variety of subjects in biomedical research. The presented work aims to study the potential of automated ventricular dimension estimation through heart segmentation in medaka. For more on this, it's time for a closer look on our paper and the supplementary materials.

0 papers0 benchmarksBiology, Biomedical, Images, Medical, Time series, Videos

ZooScanNet (ZooScanNet: plankton images captured with the ZooScan)

Plankton was sampled with various nets, from bottom or 500m depth to the surface, in many oceans of the world. Samples were imaged with a ZooScan. The full images were processed with ZooProcess which generated regions of interest (ROIs) around each individual object and a set of associated features measured on the object (see Gorsky et al 2010 for more information). The same objects were re-processed to compute features with the scikit-image toolbox (http://scikit-image.org). The 1,433,278 resulting objects were sorted by a limited number of operators, following a common taxonomic guide, into 93 taxa, using the web application EcoTaxa (http://ecotaxa.obs-vlfr.fr).

0 papers0 benchmarksBiology, Images

Genome-wide miRNA detection (Genome-wide hairpins datasets of animals and plants for novel miRNA prediction)

We've made available several genome-wide datasets, which can be used for training microRNA (miRNA) classifiers. The hairpin sequences available are from the genomes of: Homo sapiens, Arabidopsis thaliana, Anopheles gambiae, Caenorhabditis elegans and Drosophila melanogaster. Hairpin.s are small RNA sequences that naturaly folds into a hairpin-structure. However, not all hairpins have clear function (they are not miRNAs).

0 papers0 benchmarksBiology, Biomedical

SourceData-NLP (The SourceData-NLP dataset: integrating curation into scientific publishing for training large language models)

Introduction: The scientific publishing landscape is expanding rapidly, creating challenges for researchers to stay up-to-date with the evolution of the literature. Natural Language Processing (NLP) has emerged as a potent approach to automating knowledge extraction from this vast amount of publications and preprints. Tasks such as Named-Entity Recognition (NER) and Named-Entity Linking (NEL), in conjunction with context-dependent semantic interpretation, offer promising and complementary approaches to extracting structured information and revealing key concepts. Results: We present the SourceData-NLP dataset produced through the routine curation of papers during the publication process. A unique feature of this dataset is its emphasis on the annotation of bioentities in figure legends. We annotate eight classes of biomedical entities (small molecules, gene products, subcellular components, cell lines, cell types, tissues, organisms, and diseases), their role in the experimental de

0 papers0 benchmarksBiology, Biomedical, Texts

ALFI (Annotations for Label-Free Images)

ALFI (Annotations for Label-Free Images) is a dataset of images and annotations for label-free microscopy imaging. It consists of 29 time-lapse image sequences with various annotations (pixel-wise segmentation masks, object-wise bounding boxes, and tracking information), made publicly available to the scientific community through figshare.

0 papers0 benchmarksBiology, Images, Texts, Tracking

WildlifeReID-10k

0 papers0 benchmarksBiology, Images

CAMEO (Continuous automated model evaluation)

Xavier Robin, Juergen Haas, Rafal Gumienny, Anna Smolinski, Gerardo Tauriello, and Torsten Schwede.Continuous automated model evaluation (cameo)—perspectives on the future of fully automated evaluation of structure prediction methods.Proteins: Structure, Function, and Bioinformatics, 89:1977–1986, 12 2021.ISSN 0887-3585.doi: 10.1002/prot.26213.

0 papers0 benchmarksBiology

GPRD-Sella benchmark

This record contains the saddle search output logs for Sella and EON (dimer, with and without GPR acceleration). The data also includes full trajectories of the GPR accelerated dimer method for visual analysis. These logs are used to generate the figures in the manuscript. For details, refer to the code in the associated GitHub repository.

0 papers0 benchmarks3D, Biology, Physics

PreviousPage 4 of 4