TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Domain-independent Extraction of Scientific Concepts from ...

Domain-independent Extraction of Scientific Concepts from Research Articles

Arthur Brack, Jennifer D'Souza, Anett Hoppe, Sören Auer, Ralph Ewerth

2020-01-09Accepted for publishing in 42nd European Conference on IR Research, ECIR 2020 2020 1Scientific Concept ExtractionActive LearningNamed Entity Recognition (NER)
PaperPDFCode

Abstract

We examine the novel task of domain-independent scientific concept extraction from abstracts of scholarly articles and present two contributions. First, we suggest a set of generic scientific concepts that have been identified in a systematic annotation process. This set of concepts is utilised to annotate a corpus of scientific abstracts from 10 domains of Science, Technology and Medicine at the phrasal level in a joint effort with domain experts. The resulting dataset is used in a set of benchmark experiments to (a) provide baseline performance for this task, (b) examine the transferability of concepts between domains. Second, we present two deep learning systems as baselines. In particular, we propose active learning to deal with different domains in our task. The experimental results show that (1) a substantial agreement is achievable by non-experts after consultation with domain experts, (2) the baseline system achieves a fairly high F1 score, (3) active learning enables us to nearly halve the amount of required training data.

Results

TaskDatasetMetricValueModel
Named Entity Recognition (NER)STM-corpusExact Span F166.4SciBERT (active learning)
Named Entity Recognition (NER)STM-corpusExact Span F165.5SciBERT (full data)

Related Papers

A Risk-Aware Adaptive Robust MPC with Learned Uncertainty Quantification2025-07-15CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization2025-07-08MP-ALOE: An r2SCAN dataset for universal machine learning interatomic potentials2025-07-08Flippi: End To End GenAI Assistant for E-Commerce2025-07-08Selecting and Merging: Towards Adaptable and Scalable Named Entity Recognition with Large Language Models2025-06-28Active Learning for Manifold Gaussian Process Regression2025-06-26Machine-Learning-Assisted Photonic Device Development: A Multiscale Approach from Theory to Characterization2025-06-24Active Learning-Guided Seq2Seq Variational Autoencoder for Multi-target Inhibitor Generation2025-06-18