Segmentation in super-chunks with a finite-state approach

  • Since Harris’ parser in the late 50s, multiword units have been progressively integrated in parsers. Nevertheless, in the most part, they are still restricted to compound words, that are more stable and less numerous. Actually, language is full of semi-fixed expressions that also form basic semantic units: semi-fixed adverbial expressions (e.g. time), collocations. Like compounds, the identification of these structures limits the combinatorial complexity induced by lexical ambiguity. In this paper, we detail an experiment that largely integrates these notions in a finite-state procedure of segmentation into super-chunks, preliminary to a parser.We show that the chunker, developped for French, reaches 92.9% precision and 98.7% recall. Moreover, multiword units realize 36.6% of the attachments within nominal and prepositional phrases.

Download full text files

Export metadata

Additional Services

Search Google Scholar Statistics
Metadaten
Author details:Olivier Blanc, Matthieu Constant, Patrick Watrin
URN:urn:nbn:de:kobv:517-opus-27133
Publication type:Conference Proceeding
Language:English
Publication year:2008
Publishing institution:Universität Potsdam
Release date:2008/12/11
Organizational units:Extern / Extern
DDC classification:4 Sprache / 40 Sprache / 400 Sprache
Collection(s):Universität Potsdam / Tagungsbände/Proceedings (nicht fortlaufend) / Finite-state methods and natural language processing : 6th International Workshop, FSMNLP 2007 / II Regular Papers
License (German):License LogoKeine öffentliche Lizenz: Unter Urheberrechtsschutz
External remark:
The complete edition of the proceedings "Finite-state methods and natural language processing : 6th International Workshop, FSMNLP 2007 ; Revised Papers" is available:
URN urn:nbn:de:kobv:517-opus-23812
Accept ✔
This website uses technically necessary session cookies. By continuing to use the website, you agree to this. You can find our privacy policy here.