posted on 2002-10-01, 00:00authored byShimon Kogan, Dimitry Levin, Bryan R Routledge, Jacob S. Sagi, Noah A. Smith
We address a text regression problem: given a
piece of text, predict a real-world continuous
quantity associated with the text’s meaning. In
this work, the text is an SEC-mandated financial
report published annually by a publiclytraded
company, and the quantity to be predicted
is volatility of stock returns, an empirical
measure of financial risk. We apply wellknown
regression techniques to a large corpus
of freely available financial reports, constructing
regression models of volatility for
the period following a report. Our models rival
past volatility (a strong baseline) in predicting
the target variable, and a single model
that uses both can significantly outperform
past volatility. Interestingly, our approach is
more accurate for reports after the passage of
the Sarbanes-Oxley Act of 2002, giving some
evidence for the success of that legislation in
making financial reports more informative.