Thursday, October 7, 2010

Le Zhao and Jonathan Elsas -- Wednesday October 13th

Please join us for two upcoming IR series talks on Wednesday, Oct 13, 2010.

Lunch will be provided by Yahoo!.

Date/Time: Wednesday, Oct 13, 2010, noon
Place: GHC 4405

First Speaker: Le Zhao
Title: Term Necessity Prediction

The probability that a term appears in relevant documents (P(t|R)) is a fundamental quantity in several probabilistic retrieval models, however it is difficult to estimate without relevance judgments or a relevance model. We call this value term necessity because it measures the percentage of relevant documents retrieved by the term – how necessary a term’s occurrence is to document relevance. Prior research typically either set this probability to a constant, or estimated it based on the term's inverse document frequency, neither of which was very effective.

This paper identifies several factors that affect term necessity, for example, a term’s topic centrality, synonymy and abstractness. It develops term- and query-dependent features for each factor that enable supervised learning of a predictive model of term necessity from training data. Experiments with two popular retrieval models and 6 standard datasets demonstrate that using predicted term necessity estimates as user term weights for the original query terms leads to
significant improvements in retrieval accuracy.

The paper will appear in CIKM 2010.


Second Speaker: Jon Elsas
Title: Rank Learning for Factoid Question Answering with Linguistic and Semantic Constraints

This work presents a general rank-learning framework for leveraging deep linguistic and semantic features for passage ranking within Question Answering (QA) systems. The passage ranking framework enables query-time checking of these complex and long-distance constraints among question features such as keywords and named entities. These constraints can include keyword ordering, annotation type-checking, verb-argument attachment and arbitrary long-distance paths through an annotation graph. We show that a trained ranking model using this rich feature set achieves greater than a 20% improvement in Mean Average Precision over baseline keyword retrieval models. We also show that for questions expressing the most complex linguistic semantic constraints, further gains in MAP are realized, yielding a 40% improvement over the baseline.

The paper will appear in CIKM 2010.