Timezone: »

A Bayesian LDA-based model for semi-supervised part-of-speech tagging
Kristina N Toutanova · Mark Johnson

Mon Dec 03 08:10 PM -- 08:25 PM (PST) @

We present a novel Bayesian statistical model for semi-supervised part-of-speech tagging. Our model extends the Latent Dirichlet Allocation (LDA) model and incorporates the intuition that words' distributions over tags, p(t|w), are sparse. In addition we introduce a model for determining the set of possible tags of a word which captures important dependencies in the ambiguity classes of words. Our model outperforms the best previously proposed model for this task on a standard dataset.

Author Information

Kristina N Toutanova (Microsoft Research)
Mark Johnson (Macquarie University)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors