TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/MPNet

MPNet

Natural Language ProcessingIntroduced 200017 papers
Source Paper

Description

MPNet is a pre-training method for language models that combines masked language modeling (MLM) and permuted language modeling (PLM) in one view. It takes the dependency among the predicted tokens into consideration through permuted language modeling and thus avoids the issue of BERT. On the other hand, it takes position information of all tokens as input to make the model see the position information of all the tokens and thus alleviates the position discrepancy of XLNet.

The training objective of MPNet is:

E_z∈Z_n∑n_t=c+1log⁡P(x_z_t∣x_z_<t,M_z_>c;θ)\mathbb{E}\_{z\in{\mathcal{Z}\_{n}}} \sum^{n}\_{t=c+1}\log{P}\left(x\_{z\_{t}}\mid{x\_{z\_{<t}}}, M\_{z\_{{>}{c}}}; \theta\right)E_z∈Z_n∑n_t=c+1logP(x_z_t∣x_z_<t,M_z_>c;θ)

As can be seen, MPNet conditions on x_z_<t{x\_{z\_{<t}}}x_z_<t (the tokens preceding the current predicted token x_z_tx\_{z\_{t}}x_z_t) rather than only the non-predicted tokens x_z_<=c{x\_{z\_{<=c}}}x_z_<=c in MLM; comparing with PLM, MPNet takes more information (i.e., the mask symbol [M][M][M] in position z_>cz\_{>c}z_>c) as inputs. Although the objective seems simple, it is challenging to implement the model efficiently. For details, see the paper.

Papers Using This Method

Computational Detection of Intertextual Parallels in Biblical Hebrew: A Benchmark Study Using Transformer-Based Language Models2025-06-30Large Language Model Guided Progressive Feature Alignment for Multimodal UAV Object Detection2025-03-10"Actionable Help" in Crises: A Novel Dataset and Resource-Efficient Models for Identifying Request and Offer Social Media Posts2025-02-24Explicit Depth-Aware Blurry Video Frame Interpolation Guided by Differential Curves2025-01-01CReMa: Crisis Response through Computational Identification and Matching of Cross-Lingual Requests and Offers Shared on Social Media2024-05-20Harnessing PubMed User Query Logs for Post Hoc Explanations of Recommended Similar Articles2024-02-05RECipe: Does a Multi-Modal Recipe Knowledge Graph Fit a Multi-Purpose Recommendation System?2023-08-08Specious Sites: Tracking the Spread and Sway of Spurious News Stories at Scale2023-08-03Identifying Misinformation on YouTube through Transcript Contextual Analysis with Transformer Models2023-07-22Utilizing ChatGPT Generated Data to Retrieve Depression Symptoms from Social Media2023-07-05Vec2Vec: A Compact Neural Network Approach for Transforming Text Embeddings with High Fidelity2023-06-22Partial Mobilization: Tracking Multilingual Information Flows Amongst Russian Media Outlets and Telegram2023-01-25Using Large Pre-Trained Language Model to Assist FDA in Premarket Medical Device2022-11-03Happenstance: Utilizing Semantic Search to Track Russian State Media Narratives about the Russo-Ukrainian War On Reddit2022-05-28YoungSheldon at SemEval-2021 Task 5: Fine-tuning Pre-trained Language Models for Toxic Spans Detection using Token classification Objective2021-08-01mpNet: variable depth unfolded neural network for massive MIMO channel estimation2020-08-07MPNet: Masked and Permuted Pre-training for Language Understanding2020-04-20