Estimating Trajectories of Statisticians from Co-citation Networks

Zheng Tracy Ke (Harvard University)



We are interested in characterizing the evolvement of research interests of individual authors (i.e., the research trajectory). We approach this with a data set we collected and cleaned with 2+ years of efforts. The data set consists of the citation and bibtex (author, title, abstract, reference) information of over 83K papers published in 36 statistical journals from 1975 to 2015 (data link). Using the data set, we constructed 21 co-citation networks, each for a time window between 1990 and 2015. We propose a dynamic Degree-Corrected Mixed- Membership (dynamic-DCMM) model, where we model the research interests of an author by a low-dimensional weight vector (called the network memberships) that evolves slowly over time. We propose dynamic-SCORE as a new spectral approach to estimating the memberships.

We discover a triangle in the spectral domain which we call the Statistical Triangle, and use it to visualize the research trajectories of individual authors. We interpret the three vertices of the triangle as the three primary research areas in statistics: ''Bayes'', ''Biostatistics'' and ''Nonparametrics''. The Statistical Triangle further splits into 15 sub-regions, which we interpret as the 15 representative sub-areas in statistics. These results provide useful insights over the research trend and behavior of statisticians.



Back to Day 1