Multi-attribute dataset on statisticians
  • It contains the bibtex and citation information of over 83K papers in statistics-related journals.
  • Visit the MADStat project website
  • Citation: Ji, Jin, Ke and Li (2022) Co-citation and co-authorship networks of statisticians (with discussions). Journal of Business & Economic Statistics, 40(2), 469-485.
Network community detection
  • SCORE and SCORE+ (a refinement of SCORE) are spectral algorithms for network community detection.
  • Download code in Matlab and R.
  • An R package ScorePlus is available.
  • Citation: Jin, Ke and Luo (2022) Improvements on SCORE, especially for weak signals. Sanyhya A, 84(1), 127-162.
Network mixed membership estimation
  • Mixed-SCORE is a spectral algorithm for estimating mixed memberships in a network.
  • Download code and data sets (Citee and Trade networks) at GitHub.
  • The R package ScorePlus also contains a function to implement Mixed-SCORE.
  • Citation: Jin, Ke and Luo (2017) Estimating network memberships by simplex vertex hunting. arXiv:1708.07852.
Topic modeling
  • Topic-SCORE is a spectral algorithm for estimating the topic vectors in a topic model.
  • Download code and data sets (Associated Press and Statistical Abstracts corpora) at GitHub.
  • An R package TopicScore is available.
  • Citation: Ke and Wang (2022) Using SVD for topic modeling. Journal of the American Statistical Association.
Estimating the number of spikes in a covariance model
  • BEMA is a method for estimating the number of spikes in a spiked covariance model. It fits a "null scree plot" and compares it with the actual scree plot to determine K.
  • Download code at GitHub.
  • Citation: Ke, Ma and Lin (2021) Estimation of the number of spiked eigenvalues in a covariance matrix by bulk eigenvalue matching analysis. Journal of the American Statistical Association.
Fitting a measurement error model
  • NNME is a neural network approach to fitting a measurement error model.
  • Download code at GitHub.
  • Citation: Hu, Ke and Liu (2022) Measurement error models: From nonparametric methods to deep neural networks. Statistical Science, 37(4), 473-493.
Data Harmonization for GWAS
  • The code implements a data harmonization and quality control pipeline for combining multiple public controls in GWAS.
  • Download code at GitHub.
  • Citation: Chen et al. (2022) A data harmonization pipeline to leverage external controls and boost power in GWAS. Human Molecular Genetics, 31(3), 481-489.
Allocation of COVID testing budget
  • The algorithm dynamically allocates COVID testing budget, assuming a network SIR model of disease transmission.
  • Download code at GitHub.
  • Citation: Huang, Ke and Jin (2021) Allocation of COVID testing budget on a commute network of counties. Stat, 11(1), e441.