Jason Chuang, Christopher D. Manning, Jeffrey Heer
Keyphrases aid the exploration of text collections by communicating salient aspects of documents and are often used to create effective visualizations of text. While prior work in HCI and visualization has proposed a variety of ways of presenting keyphrases, less attention has been paid to selecting the best descriptive terms. In this article, we investigate the statistical and linguistic properties of keyphrases chosen by human judges and determine which features are most predictive of high-quality descriptive phrases. Based on 5,611 responses from 69 graduate students describing a corpus of dissertation abstracts, we analyze characteristics of human-generated keyphrases, including phrase length, commonness, position, and part of speech.