TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Using LLMs for the Extraction and Normalization of Product...

Using LLMs for the Extraction and Normalization of Product Attribute Values

Alexander Brinkmann, Nick Baumann, Christian Bizer

2024-03-04AttributeAttribute Value ExtractionProduct Recommendation
PaperPDFCode(official)

Abstract

Product offers on e-commerce websites often consist of a product title and a textual product description. In order to enable features such as faceted product search or to generate product comparison tables, it is necessary to extract structured attribute-value pairs from the unstructured product titles and descriptions and to normalize the extracted values to a single, unified scale for each attribute. This paper explores the potential of using large language models (LLMs), such as GPT-3.5 and GPT-4, to extract and normalize attribute values from product titles and descriptions. We experiment with different zero-shot and few-shot prompt templates for instructing LLMs to extract and normalize attribute-value pairs. We introduce the Web Data Commons - Product Attribute Value Extraction (WDC-PAVE) benchmark dataset for our experiments. WDC-PAVE consists of product offers from 59 different websites which provide schema.org annotations. The offers belong to five different product categories, each with a specific set of attributes. The dataset provides manually verified attribute-value pairs in two forms: (i) directly extracted values and (ii) normalized attribute values. The normalization of the attribute values requires systems to perform the following types of operations: name expansion, generalization, unit of measurement conversion, and string wrangling. Our experiments demonstrate that GPT-4 outperforms the PLM-based extraction methods SU-OpenTag, AVEQA, and MAVEQA by 10%, achieving an F1-score of 91%. For the extraction and normalization of product attribute values, GPT-4 achieves a similar performance to the extraction scenario, while being particularly strong at string wrangling and name expansion.

Results

TaskDatasetMetricValueModel
Information ExtractionWDC-PAVEF1-Score90.54GPT-4_10_example_values_&_10_demonstrations
Information ExtractionWDC-PAVEF1-Score88.02GPT-3.5_10_example_values_&_10_demonstrations
Information ExtractionWDC-PAVEF1-Score80.83AVEQA
Information ExtractionWDC-PAVEF1-Score65.1MAVEQA
Information ExtractionWDC-PAVEF1-Score60.44SU-OpenTag
Attribute Value ExtractionWDC-PAVEF1-Score90.54GPT-4_10_example_values_&_10_demonstrations
Attribute Value ExtractionWDC-PAVEF1-Score88.02GPT-3.5_10_example_values_&_10_demonstrations
Attribute Value ExtractionWDC-PAVEF1-Score80.83AVEQA
Attribute Value ExtractionWDC-PAVEF1-Score65.1MAVEQA
Attribute Value ExtractionWDC-PAVEF1-Score60.44SU-OpenTag

Related Papers

MGFFD-VLM: Multi-Granularity Prompt Learning for Face Forgery Detection with VLM2025-07-16Non-Adaptive Adversarial Face Generation2025-07-16Attributes Shape the Embedding Space of Face Recognition Models2025-07-15COLIBRI Fuzzy Model: Color Linguistic-Based Representation and Interpretation2025-07-15Ref-Long: Benchmarking the Long-context Referencing Capability of Long-context Language Models2025-07-13Model Parallelism With Subnetwork Data Parallelism2025-07-11Bradley-Terry and Multi-Objective Reward Modeling Are Complementary2025-07-10Evaluating Attribute Confusion in Fashion Text-to-Image Generation2025-07-09