file.pdf (127.71 kB)
Mixed Membership Models of Scientific Publications
journal contribution
posted on 1987-01-01, 00:00 authored by Elena Erosheva, Stephen E. Fienberg, John D. LaffertyThe Proceedings of the National Academy of Sciences (PNAS) is one of world’s most cited multidisciplinary
scientific journals. The PNAS official classification structure of subjects is reflected in topic labels submitted
by the authors of manuscripts, largely related to traditionally established disciplines. These include broad
field classifications into Physical Sciences, Biological Sciences, Social Sciences, and further subtopic classifications
within the fields. Focusing on Biological Sciences, we explore an internal soft classification structure
of articles based only on semantic decompositions of abstracts and bibliographies, and compare it with the
formal discipline classifications.
Our model assumes that there is a fixed number of internal categories, each characterized by multinomial
distributions over words (in abstracts) and references (in bibliographies). Soft classification for each
article is based on proportions of the article’s content coming from each category. We discuss the appropriateness
of the model for the PNAS database as well as other features of the data relevant to soft classification