71 machine learning datasets
71 dataset results
The BIRDeep Audio Annotations dataset is a collection of bird vocalizations from Doñana National Park, Spain. It was created as part of the BIRDeep project, which aims to optimize the detection and classification of bird species in audio recordings using deep learning techniques. The dataset is intended for use in training and evaluating models for bird vocalization detection and identification.
This dataset contains pre-processed versions of datasets introduced in prior works. Additionally, it also contains new data that are pertinent to the paper.
How to Resolve a Dispute on 𝓔𝔁𝓹𝓮𝓭𝓲𝓪 ?
The medaka (Oryzias latipes) and the zebrafish (Danio rerio) are used as a model organism for a variety of subjects in biomedical research. The presented work aims to study the potential of automated ventricular dimension estimation through heart segmentation in medaka. For more on this, it's time for a closer look on our paper and the supplementary materials.
Plankton was sampled with various nets, from bottom or 500m depth to the surface, in many oceans of the world. Samples were imaged with a ZooScan. The full images were processed with ZooProcess which generated regions of interest (ROIs) around each individual object and a set of associated features measured on the object (see Gorsky et al 2010 for more information). The same objects were re-processed to compute features with the scikit-image toolbox (http://scikit-image.org). The 1,433,278 resulting objects were sorted by a limited number of operators, following a common taxonomic guide, into 93 taxa, using the web application EcoTaxa (http://ecotaxa.obs-vlfr.fr).
We've made available several genome-wide datasets, which can be used for training microRNA (miRNA) classifiers. The hairpin sequences available are from the genomes of: Homo sapiens, Arabidopsis thaliana, Anopheles gambiae, Caenorhabditis elegans and Drosophila melanogaster. Hairpin.s are small RNA sequences that naturaly folds into a hairpin-structure. However, not all hairpins have clear function (they are not miRNAs).
Introduction: The scientific publishing landscape is expanding rapidly, creating challenges for researchers to stay up-to-date with the evolution of the literature. Natural Language Processing (NLP) has emerged as a potent approach to automating knowledge extraction from this vast amount of publications and preprints. Tasks such as Named-Entity Recognition (NER) and Named-Entity Linking (NEL), in conjunction with context-dependent semantic interpretation, offer promising and complementary approaches to extracting structured information and revealing key concepts. Results: We present the SourceData-NLP dataset produced through the routine curation of papers during the publication process. A unique feature of this dataset is its emphasis on the annotation of bioentities in figure legends. We annotate eight classes of biomedical entities (small molecules, gene products, subcellular components, cell lines, cell types, tissues, organisms, and diseases), their role in the experimental de
ALFI (Annotations for Label-Free Images) is a dataset of images and annotations for label-free microscopy imaging. It consists of 29 time-lapse image sequences with various annotations (pixel-wise segmentation masks, object-wise bounding boxes, and tracking information), made publicly available to the scientific community through figshare.
<img src="https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12294787%2F2e9b3b5a8f236aab36655b4a0db4e311%2Foverview.jpg?generation=1718265309709943&alt=media" alt="drawing" style="width:700px;"/>
Xavier Robin, Juergen Haas, Rafal Gumienny, Anna Smolinski, Gerardo Tauriello, and Torsten Schwede.Continuous automated model evaluation (cameo)—perspectives on the future of fully automated evaluation of structure prediction methods.Proteins: Structure, Function, and Bioinformatics, 89:1977–1986, 12 2021.ISSN 0887-3585.doi: 10.1002/prot.26213.
This record contains the saddle search output logs for Sella and EON (dimer, with and without GPR acceleration). The data also includes full trajectories of the GPR accelerated dimer method for visual analysis. These logs are used to generate the figures in the manuscript. For details, refer to the code in the associated GitHub repository.