A Dependency Parser for Tweets

Kong, Lingpeng; Schneider, Nathan; Swayamdipta, Swabha; Bhatia, Archna; Dyer, Chris; A. Smith, Noah

doi:10.1184/R1/6472958.v1

A Dependency Parser for Tweets

journal contribution

posted on 2014-10-01, 00:00 authored by Lingpeng Kong, Nathan Schneider, Swabha Swayamdipta, Archna Bhatia, Chris Dyer, Noah A. Smith

We describe a new dependency parser for English tweets, TWEEBOPARSER. The parser builds on several contributions: new syntactic annotations for a corpus of tweets (TWEEBANK), with conventions informed by the domain; adaptations to a statistical parsing algorithm; and a new approach to exploiting out-of-domain Penn Treebank data. Our experiments show that the parser achieves over 80% unlabeled attachment accuracy on our new, high-quality test set and measure the benefit of our contributions.

Our dataset and parser can be found at http://www.ark.cs.cmu.edu/TweetNLP.

History

Publisher Statement

Date

2014-10-01

Usage metrics

Keywords

Language Technologies Information and Computing Sciences not elsewhere classified

Licence

In Copyright

A Dependency Parser for Tweets

History

Publisher Statement

Date

Usage metrics

Categories

Keywords

Licence

Exports