TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/DDParser

DDParser

Baidu Dependency Parser

Natural Language ProcessingIntroduced 20001 papers
Source Paper

Description

DDParser, or Baidu Dependency Parser, is a Chinese dependency parser trained on a large-scale manually labeled dataset called Baidu Chinese Treebank (DuCTB).

For inputs, for the iii th word, its input vector eie_{i}ei​ is the concatenation of the word embedding and character-level representation:

e_i=e_iword⊕CharLSTM(w_i)e\_{i}=e\_{i}^{w o r d} \oplus C h a r L S T M\left(w\_{i}\right)e_i=e_iword⊕CharLSTM(w_i)

Where CharLSTM⁡(wi)\operatorname{CharLSTM}\left(w_{i}\right)CharLSTM(wi​) is the output vectors after feeding the character sequence into a BiLSTM layer. The experimental results on DuCTB dataset show that replacing POS tag embeddings with CharLSTM⁡(wi)\operatorname{CharLSTM}\left(w_{i}\right)CharLSTM(wi​) leads to the improvement.

For the BiLSTM encoder, three BiLSTM layers are employed over the input vectors for context encoding. Denote r_ir\_{i}r_i the output vector of the top-layer BiLSTM for w_iw\_{i}w_i

The dependency parser of Dozat and Manning is used. Dimension-reducing MLPs are applied to each recurrent output vector r_ir\_{i}r_i before applying the biaffine transformation. Applying smaller MLPs to the recurrent output states before the biaffine classifier has the advantage of stripping away information not relevant to the current decision. Then biaffine attention is used both in the dependency arc classifier and relation classifier. The computations of all symbols in the Figure are shown below:

hid−arc=MLPd−arc(ri)h_{i}^{d-a r c}=M L P^{d-a r c}\left(r_{i}\right)hid−arc​=MLPd−arc(ri​) hih−arc=MLPh−arc(ri)h_{i}^{h-a r c}=M L P^{h-a r c}\left(r_{i}\right) \\hih−arc​=MLPh−arc(ri​) hid−rel=MLPd−rel(ri)h_{i}^{d-r e l}=M L P^{d-r e l}\left(r_{i}\right) \\hid−rel​=MLPd−rel(ri​) hih−rel=MLPh−rel(ri)h_{i}^{h-r e l}=M L P^{h-r e l}\left(r_{i}\right) \\hih−rel​=MLPh−rel(ri​) Sarc=(Hd−arc⊕I)UarcHh−arcS^{a r c}=\left(H^{d-a r c} \oplus I\right) U^{a r c} H^{h-a r c} \\Sarc=(Hd−arc⊕I)UarcHh−arc Srel=(Hd−rel⊕I)Urel((Hh−rel)T⊕I)TS^{r e l}=\left(H^{d-r e l} \oplus I\right) U^{r e l}\left(\left(H^{h-r e l}\right)^{T} \oplus I\right)^{T}Srel=(Hd−rel⊕I)Urel((Hh−rel)T⊕I)T

For the decoder, the first-order Eisner algorithm is used to ensure that the output is a projection tree. Based on the dependency tree built by biaffine parser, we get a word sequence through the in-order traversal of the tree. The output is a projection tree only if the word sequence is in order.

Papers Using This Method

A Practical Chinese Dependency Parser Based on A Large-scale Dataset2020-09-02