Novel Methods for Multi-ancestry Polygenic Prediction and their Evaluations in 3.7 Million Individuals of Diverse Ancestry
Polygenic risk scores are becoming increasingly predictive of complex traits, but subpar performance in non-European populations raises concerns about their potential clinical applications. We develop a powerful and scalable method to calculate PRS using GWAS summary statistics from multi-ancestry training samples by integrating multiple techniques, including clumping and thresholding, empirical Bayes and super learning. We evaluate the performance of the proposed method and a variety of alternatives using large-scale simulated GWAS on ~19 million common variants and large 23andMe Inc. datasets, including up to 800K individuals from four non- European populations, across seven complex traits. Results show that the proposed method can substantially improve the performance of PRS in non-European populations relative to simple alternatives and has comparable or superior performance relative to a recent method that requires a higher order of computational time. Further, our simulation studies provide novel insights to sample size requirements and the effect of SNP density on multi-ancestry risk prediction.
Back to Day 2