Genotype Imputation

Genotype Imputation: A Pop-Sci Dive 🧬

Dive into the captivating realm of genotype imputation! Your pop-sci journey starts here.

Introduction

Genotype imputation is a statistical technique used to infer missing genotypic data for individuals. It is often employed in genetic association studies to improve the resolution of mapping, increase the coverage of genetic variations, and facilitate meta-analyses by combining datasets genotyped on different panels.

Genotype imputation has seen advancements with the advent of machine learning techniques. Recent research has leveraged deep learning to improve the accuracy of imputation, enabling better understanding of genetic associations and diseases.

Fun animation about genotype imputation

An illustration of genotype imputation

Reference

Das,S., Abecasis,G.R. and Browning,B.L. (2018) Genotype imputation from large reference panels. Annu Rev Genom Hum G, 19, 73–96.

Chip Imputation

Chip imputation involves using genotyping arrays or "chips" that capture specific genetic variants in the genome. These chips may differ in density, capturing from thousands to millions of variants.

The choice of chip may significantly impact the quality of imputation. High-density chips provide more comprehensive data but at a higher cost, while low-density chips are more cost-effective, albeit with a potential loss in data richness.

Allele Frequency Density Graphs by Breed and Chip Density

The graphs below illustrate the distribution of allele frequencies, organized by different breeds and chip densities, based on data from 103,988 animals across nine distinct cattle breeds. Each row represents a distinct breed, while each column corresponds to a chip density (1K, 5K, 10K, 14K). These graphs provide a visual insight into the variation and consistency of genetic information captured by each chip.

Analyzing allele frequency density across varying chip densities is instrumental in understanding the distribution of genetic variants, which is crucial for selecting appropriate genotyping arrays for specific studies. The choice of chip density significantly impacts both the cost and the richness of genetic data obtained.

Lower density chips are cost-effective and may suffice for large-scale studies with budget constraints, capturing a substantial amount of genetic variation. Conversely, high-density chips, while more expensive, provide a more comprehensive view of genetic variation, which might be essential for certain investigative studies. The comparative analysis of allele frequency distributions between different chip densities aids in balancing budgetary considerations against the level of genetic insight required for a study.

Density Plot for Breed1 in 1K
Density Plot for Breed1 in 5K
Density Plot for Breed1 in 10K
Density Plot for Breed1 in 14K
Density Plot for Breed1 in 1K
Density Plot for Breed1 in 5K
Density Plot for Breed1 in 10K
Density Plot for Breed1 in 14K
Density Plot for Breed1 in 1K
Density Plot for Breed1 in 5K
Density Plot for Breed1 in 10K
Density Plot for Breed1 in 14K
Density Plot for Breed1 in 1K
Density Plot for Breed1 in 5K
Density Plot for Breed1 in 10K
Density Plot for Breed1 in 14K
Density Plot for Breed1 in 1K
Density Plot for Breed1 in 5K
Density Plot for Breed1 in 10K
Density Plot for Breed1 in 14K
Density Plot for Breed1 in 1K
Density Plot for Breed1 in 5K
Density Plot for Breed1 in 10K
Density Plot for Breed1 in 14K
Density Plot for Breed1 in 1K
Density Plot for Breed1 in 5K
Density Plot for Breed1 in 10K
Density Plot for Breed1 in 14K
Density Plot for Breed1 in 1K
Density Plot for Breed1 in 5K
Density Plot for Breed1 in 10K
Density Plot for Breed1 in 14K
Density Plot for Breed1 in 1K
Density Plot for Breed1 in 5K
Density Plot for Breed1 in 10K
Density Plot for Breed1 in 14K

Data source

Wang Y, Wu X-L, Li Z, Bao Z, Tait RG Jr, Bauck S and Rosa GJM (2020) Estimation of Genomic Breed Composition for Purebred and Crossbred Animals Using Sparsely Regularized Admixture Models. Front. Genet. 11:576. doi: 10.3389/fgene.2020.00576

Popular Chip Imputation Software

Low-Coverage Imputation

Low-coverage sequencing, instead of genotyping chips, provides sequences for a fraction of the genome. Imputation in this context is about leveraging this sparse data, along with a reference panel, to infer missing genetic information.

Low-coverage sequencing, while cost-effective, may not provide the same level of genetic resolution as high-throughput sequencing. However, with advanced imputation techniques, significant insights can still be gleaned from low-coverage data.

Popular Low-Coverage Imputation Software

Future Directions and Prospects

The realm of genotype imputation is poised for further advancements with evolving machine learning algorithms, especially deep learning. As sequencing technologies improve and become more affordable, higher resolution imputation with less computational cost is envisioned. Additionally, integrating functional genomic data with imputation processes is anticipated to refine imputation accuracy further. The ultimate aim remains to enhance the understanding of genetic architectures and foster discoveries in genetic associations and disease understanding.

Go Back