Skip to content

Latest commit

 

History

History
98 lines (83 loc) · 7.88 KB

README.md

File metadata and controls

98 lines (83 loc) · 7.88 KB

Population Genomic Analysis 2023 (PGBI11126)

GitHub repository of the Population Genomic Analysis 2023 (PGBI11126) postgraduate course at the University of Edinburgh

Table of contents

Contacts

Course description

The course covers the core concepts of modern population genomic analysis which focuses on analysing sequence variation contained in samples of genomes. The aim is to both introduce students to the mathematical models and computational algorithms that describe the ancestry of genomes in evolving populations and show how these are applied in practice to make inferences about the interplay of evolutionary forces (genetic drift, recombination, selection and demographic history) using hands-on data examples.

The course includes a detailed exposition of the coalescent, the canonical model of sample ancestry and the relevant data structures (genealogies, tree-sequences and graphs) for describing genetic ancestry. A major focus of the course is to understand how this basic stochastic model:

  • extends to include all fundamental evolutionary forces (recombination, population structure, admixture and natural selection) and
  • is used to make inferences from both modern and ancient samples using mathematical analysis and simulation.

The course is run as a set of computer practicals which analyse genomic data (both real and simulated) through interactive jupyterLab notebooks. Each practical is partnered with a short pre-recorded mini lecture covering the theoretical/conceptual background. These should be watched ahead of the corresponding practical session.

Course admin

  • PGA consists of a 10 computer practical sesssion (the last one being a class excercise that accounts for 75% of the mark) which analyse population genomic data.
  • The first session will be on Tuesday 17/01/2023 @ 1400-1700 hrs in the JCMB computer suite Room 1208 (follow the signs to Room 1206C, which is opposite of 1208).
  • While PGA focuses on analysing population genomic data, this will also involve using the Python programming language, which will be introduced gradually throughout the course.

Using this repository

  1. Log into LEARN.
  2. Go to the course.
  3. Click on "Course materials", then on the "Notable" icon.
  4. Select the "Chemistry Notebook" from the dropdown menu and click "Start".
  5. Click on +GitRepo to bring up the menu to clone this repository.
  6. For this you must enter the following information:
  • Git Repository URL: https://github.com/LohseLab/PGA_course_2022
  • Branch: main
  1. You can now use the Jupyter file browser to navigate to the notebooks you want to execute.

Syllabus

  • Practical_1
    • coalescent simulation and relevant data structures.
    • run and analyse coalescent simulations with msprime and tskit.
    • understand how the variance of the coalescent depends on the two major axis of sampling: number of loci and number of individuals (Felsenstein 2004).
    • understand why it is natural (and helpful) to treat mutations separately from ancestry.
  • Practical_2
    • understand why coalescent simulations are useful to gain intuition about population level processes.
    • appreciate that the site frequency spectrum (SFS) is a fundamental summary of sequence variation and understand how it relates to genealogical branch lengths.
    • understand that summary statistics are the currency for comparing real data to idealized models of population processes/history and that such comparisons can be done either via analytic results or simulations.
    • know how coalescent simulations are used in approximate likelihood inference.
  • Practical_3
    • ARGs and treesequences: how are they constructed and how do they differ?
    • appreciate that not all recombination events are detectable
    • understand the difference between map and physical length of a sequence
    • know that the span of trees along the genome is a random variable and that nodes are shared between many trees.
    • understand that the duality between branch lengths and popgen measures extends to correlated trees.
  • Practical_4
    • gain familiarity with common bioinformatic file formats (FASTA, BED, VCF)
    • understand how (population) genomic data can be represented through these file formats.
    • know that the analysis of variation data often requires additional simplifications and/or re-classification of the data
    • use common Python libraries to parse, intersect, interrogate, and visualize population genomic data
    • understand that due to background selection, genetic diversity in the genome is strongly correlated with functional constraint
  • Practical_5 (Dr Derek Setter)
    • how does positive selection act to favour a beneficial mutation?
    • understand the role of drift/randomness on allele ferquency trajectories and fixation probability
    • understand the effect of positive selection on linked neutral variation
    • understand how sweepfinder works using simulation data
    • be able to perform a Selective sweep scan on real data
  • Practical_6 (Dr Simon Martin)
    • understand genealogical dicordance and how it depends on incomplete lineage sorting and gene flow
    • understand how the divergence history of populations affects the level of incomplete lineage sorting
    • be able to run multi-population coalescent simulations and extract genealogical information
    • learn how to detect introgression from archaic Hominins into modern humans using the D statsitic (aka the ABBA/BABA test)
  • Practical_7
    • how to estimate differentiation between populations/species using 𝑑𝑥𝑦, 𝑑𝑛𝑒𝑡 and 𝐹𝑠𝑡 and understand how these summary statistics are defined and related to each other.
    • be able to use coalescent theory to relate estimates of divergence and differentiation obtained from whole genome data to models of equilibrium population structure and non-equilibrium population history.
    • be able to define outliers of differentiation in a genome scan.
    • be able to simulate sequence data under models of population structure and compare these to real data.
  • Practical_8 and 9
    • Applying the knowledge you gained from this course to novel, real-world datasets.
    • TBA