Speaker: Jaime Arguello (Language Technologies Institute, School of Computer Science, Carnegie Mellon University)
Time/Date: Thursday July 16th, 2009, 11:00 AM.
Place: Wean Hall 7220
Title: Sources of Evidence for Vertical Selection
Web search providers often include search services for domain-specific subcollections, called verticals, such as news, images, videos, job postings, company summaries, and artist profiles. We address the problem of vertical selection, predicting relevant verticals (if any) for queries issued to a search engine's main web search page. In contrast to prior collection selection tasks, vertical selection is associated with unique resources that can inform the classification decision. We focus on three sources of evidence: (1) the query string, from which features are derived independent of external resources, (2) logs of queries previously issued to the vertical directly by users, and (3) corpora representative of vertical content. These sources of evidence are integrated as features in a classification-based approach. We make use of and compare against prior work in federated search and retrieval effectiveness prediction. Our evaluation focuses on 18 different verticals, which differ in terms of semantics, media type, size, and level of query traffic. An in-depth error analysis reveals unique challenges across different verticals and provides insight into vertical selection for future work.
Based on work conducted at Yahoo! Labs Montreal to be presented at SIGIR 2009.