Igor Rozhkov, Natalia Loukachevitch
In this paper, we describe our participation in the RuTermEval competition devoted to extracting nested terms. We apply the Binder model, which was previously successfully applied to the recognition of nested named entities, to extract nested terms. We obtained the best results of term recognition in all three tracks of the RuTermEval competition. In addition, we study the new task of recognition of nested terms from flat training data annotated with terms without nestedness. We can conclude that several approaches we proposed in this work are viable enough to retrieve nested terms effectively without nested labeling of them.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Term Extraction | RuTermEval (Track 1) | Scoreboard F1 | 0.794 | full nested |
| Term Extraction | RuTermEval (Track 2) | Scoreboard Class-agnostic F1 | 0.78 | full nested |
| Term Extraction | RuTermEval (Track 2) | Scoreboard Weighted F1 | 0.6997 | full nested |
| Term Extraction | RuTermEval (Track 3) | Scoreboard Class-agnostic F1 | 0.6 | full nested |
| Term Extraction | RuTermEval (Track 3) | Scoreboard Weighted F1 | 0.4823 | full nested |
| Term Extraction | RuTermEval (Track 3) | Scoreboard Class-agnostic F1 | 0.5875 | lemm. inc. + early dmg |
| Term Extraction | RuTermEval (Track 3) | Scoreboard Weighted F1 | 0.4547 | lemm. inc. + early dmg |
| Term Extraction | RuTermEval (Track 1) | Scoreboard F1 | 0.7281 | lemm. inc. + early dmg |
| Term Extraction | RuTermEval (Track 2) | Scoreboard Class-agnostic F1 | 0.7337 | lemm. inc. + early dmg |
| Term Extraction | RuTermEval (Track 2) | Scoreboard Weighted F1 | 0.631 | lemm. inc. + early dmg |
| Nested Term Recognition | RuTermEval (Track 3) | Scoreboard Class-agnostic F1 | 0.5875 | lemm. inc. + early dmg |
| Nested Term Recognition | RuTermEval (Track 3) | Scoreboard Weighted F1 | 0.4547 | lemm. inc. + early dmg |
| Nested Term Recognition | RuTermEval (Track 1) | Scoreboard F1 | 0.7281 | lemm. inc. + early dmg |
| Nested Term Recognition | RuTermEval (Track 2) | Scoreboard Class-agnostic F1 | 0.7337 | lemm. inc. + early dmg |
| Nested Term Recognition | RuTermEval (Track 2) | Scoreboard Weighted F1 | 0.631 | lemm. inc. + early dmg |