TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/LayoutReader

LayoutReader

Natural Language ProcessingIntroduced 20001 papers
Source Paper

Description

** LayoutReader** is a sequence-to-sequence model for reading order detection that uses both textual and layout information, where the layout-aware language model LayoutLM is leveraged as an encoder. The generation step in the encoder-decoder structure tis modified to generate the reading order sequence.

In the encoding stage, LayoutReader packs the pair of source and target segments into a contiguous input sequence of LayoutLM and carefully designs the self-attention mask to control the visibility between tokens. As shown in the Figure, LayoutReader allows the tokens in the source segment to attend to each other while preventing the tokens in the target segment from attending to the rightward context. If 1 means allowing and 0 means preventing, the detail of the mask MMM is as follows:

M_i,j={1, if i<j or i,j∈src⁡0, otherwise M\_{i, j}= \begin{cases}1, & \text { if } i<j \text { or } i, j \in \operatorname{src} \\ 0, & \text { otherwise }\end{cases}M_i,j={1,0,​ if i<j or i,j∈src otherwise ​

where i,ji, ji,j are the indices in the packed input sequence, so they may be from source or target segments; i,j∈i, j \ini,j∈ src means both tokens are from source segment.

In the decoding stage, since the source and target are reordered sequences, the prediction candidates can be constrained to the source segment. Therefore, we ask the model to predict the indices in the source sequence. The probability is calculated as follows:

P(xk=i∣x<k)=exp⁡(eiTh_k+b_k)∑jexp⁡(e_jThk+b_k)\mathcal{P}\left(x_{k}=i \mid x_{<k}\right)=\frac{\exp \left(e_{i}^{T} h\_{k}+b\_{k}\right)}{\sum_{j} \exp \left(e\_{j}^{T} h_{k}+b\_{k}\right)}P(xk​=i∣x<k​)=∑j​exp(e_jThk​+b_k)exp(eiT​h_k+b_k)​

where iii is an index in the source segment; e_ie\_{i}e_i and e_je\_{j}e_j are the i\mathrm{i}i-th and j\mathrm{j}j-th input embeddings of the source segment; h_kh\_{k}h_k is the hidden states at the k\mathrm{k}k-th time step; b_kb\_{k}b_k is the bias at the k\mathrm{k}k-th time step.

Papers Using This Method

LayoutReader: Pre-training of Text and Layout for Reading Order Detection2021-08-26