posted on 2011-07-01, 00:00authored byDani Yogatama, Michael Heliman, Brendan O'Connor, Chris Dyer, Bryan R Routledge, Noah A. Smith
We consider the problem of predicting measurable responses to scientific articles based primarily on their text content. Specifically, we consider papers in two fields (economics and computational linguistics) and make predictions about downloads and within-community citations. Our approach is based on generalized linear models, allowing interpretability; a novel extension that captures first-order temporal effects is also presented. We demonstrate that text features significantly improve accuracy of predictions over metadata features like authors, topical categories, and publication venues.