Papers With Code 2 | ML Benchmarks, SotA Results & Code

Sentiment detection remains a pivotal task in natural language processing, yet its development in Arabic lags due to a scarcity of training materials compared to English. Addressing this gap, we present ArSen-20, a benchmark dataset tailored to propel Arabic sentiment detection forward. ArSen-20 comprises 20,000 professionally labeled tweets sourced from Twitter, focusing on the theme of COVID-19 and spanning the period from 2020 to 2023. Beyond tweet content, the dataset incorporates metadata associated with the user, enriching the contextual understanding. ArSen-20 offers a comprehensive resource to foster advancements in Arabic sentiment analysis and facilitate research in this critical domain.

The ArSen-20 dataset statistics:

| Statistics | Num | |:-------------:|:-----:| | Training set size | 16000 | | Validation set size| 2000 | | Testing set size | 2000 | | Neutral | 17262 | | Positive | 878 | | Negative | 1860 |

Features

The dataset has the following features:

DownLoad

You can download the dataset from here.

ArSen-20_publish.csv - Contains all features.
ArSen-20_id_only.csv - Contains only tweets and their author's id.

Citation

If you use this dataset in your research, please cite the following papers:

@inproceedings{fang2024arsen,
title={ArSen-20: A New Benchmark for Arabic Sentiment Detection},
author={Yang Fang and Cheng Xu},
booktitle={5th Workshop on African Natural Language Processing},
year={2024},
url={https://openreview.net/forum?id=GgsRUF5kJt}
}

@inproceedings{fang2024advancing,
    title = "Advancing {A}rabic Sentiment Analysis: {A}r{S}en Benchmark and the Improved Fuzzy Deep Hybrid Network",
    author = "Fang, Yang  and
      Xu, Cheng  and
      Guan, Shuhao  and
      Yan, Nan  and
      Mei, Yuke",
    editor = "Barak, Libby  and
      Alikhani, Malihe",
    booktitle = "Proceedings of the 28th Conference on Computational Natural Language Learning",
    month = nov,
    year = "2024",
    address = "Miami, FL, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.conll-1.39",
    pages = "507--516",
}

contact

If you have any questions or comments about the dataset, please contact Yang Fang (20211209024@chnu.edu.cn).

Potential cooperation in related fields is also welcome. :)