Skip to content

Commit

Permalink
update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
Zilong-Li committed Sep 27, 2024
1 parent 77c7f9d commit 815c82c
Show file tree
Hide file tree
Showing 6 changed files with 46 additions and 20 deletions.
10 changes: 4 additions & 6 deletions .github/workflows/linux.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,7 @@ jobs:

- name: test LD module
run: |
./PCAone -b example/plink -k 3 --ld -o adj
./PCAone -B adj.residuals --ld-bim adj.kept.bim --clump example/plink.pheno0.assoc --clump-p1 0.01 --clump-p2 0.05 --clump-r2 0.1 --clump-bp 10000000 -m 1 -o adj_clump_m1
./PCAone -B adj.residuals --ld-bim adj.kept.bim --clump example/plink.pheno0.assoc --clump-p1 0.01 --clump-p2 0.05 --clump-r2 0.1 --clump-bp 10000000 -m 0 -o adj_clump_m0
# make ld_matrix
# make ld_clump
# make ld_tests
make ld_matrix
make ld_r2
make ld_clump
make ld_tests
4 changes: 4 additions & 0 deletions CHANGELOG.org
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
- v0.4.5
- *can do LD prunning and clumpint out-of-core*
- add =--ld-bims, --print-r2= options

- v0.4.4
- add =--clump-names= option

Expand Down
6 changes: 5 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,7 @@ SLIBS += ./external/bgen/bgenlib.a ./external/zstd/lib/libzstd.a

LIBS += ${SLIBS} ${DLIBS} -lm -ldl

.PHONY: all clean ld_matrix ld_prune ld_clump ld_tests
.PHONY: all clean ld_matrix ld_r2 ld_prune ld_clump ld_tests

all: ${program}

Expand Down Expand Up @@ -183,6 +183,10 @@ ld_matrix:
awk '$$1==3' adj.kept.bim | cut -f4 | sort -cn
rm -f pcaone.*

ld_r2:
./PCAone -B adj.residuals --ld-bim adj.kept.bim --ld-bp 1000 --print-r2 -o adj_r2


ld_prune:
./PCAone -B adj.residuals --ld-bim adj.kept.bim --ld-r2 0.8 --ld-bp 1000000 -o adj_prune_m0 -m 0
./PCAone -B adj.residuals --ld-bim adj.kept.bim --ld-r2 0.8 --ld-bp 1000000 -o adj_prune_m1 -m 1
Expand Down
Binary file modified PCAone.pdf
Binary file not shown.
44 changes: 32 additions & 12 deletions README.org
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#+TITLE: PCAone: Principal Component Analysis All in One
#+options: toc:2 num:t email:t
#+TITLE: Principal Component Analysis All in One
#+options: toc:2 num:nil email:t
#+author: Zilong Li
#+email: zilong.dk@gmail.com
#+latex_compiler: xelatex
Expand All @@ -14,7 +14,6 @@
#+latex_header: \hypersetup{colorlinks=true, linkcolor=blue}
#+latex: \clearpage

* Badges :noexport:

[[https://github.com/Zilong-Li/PCAone/actions/workflows/linux.yml/badge.svg]]
[[https://github.com/Zilong-Li/PCAone/actions/workflows/mac.yml/badge.svg]]
Expand Down Expand Up @@ -62,6 +61,7 @@ missingness and uncertainty. The PDF manual can be downloaded [[https://github.c
- [[#memory-efficient-modes][Memory-efficient modes]]
- [[#data-normalization][Data Normalization]]
- [[#ancestry-adjusted-ld-matrix][Ancestry-Adjusted LD matrix]]
- [[#report-ld-statistics][Report LD statistics]]
- [[#prunning-based-on-ancestry-adjusted-ld][Prunning based on Ancestry-Adjusted LD]]
- [[#clumping-based-on-ancestry-adjusted-ld][Clumping based on Ancestry-Adjusted LD]]
- [[#more-examples][More Examples]]
Expand Down Expand Up @@ -267,6 +267,7 @@ Main options:
--ld-r2 arg (=0) r2 cutoff for LD-based pruning.
--ld-bp arg (=1000000) physical distance threshold in bases for ld pruning
--ld-stats arg (=0) statistics to get r2 for LD. (0: the ancestry adjusted, 1: the standard)
--print-r2 print LD r2 for pairse-wise SNPs in a ld-window (saved in <prefix>.ld.gz)
--clump arg assoc-like file with target variants and pvalues for clumping
--clump-names arg (=CHR,BP,P) column names in assoc-like file for locating chr, pos and pvalue
--clump-p1 arg (=0.0001) significance threshold for index SNPs
Expand All @@ -283,14 +284,14 @@ This depends on your datasets, particularlly the relationship between number
of samples (=N=) and the number of variants / features (=M=) and the top PCs
(=k=). Here is an overview and the recommendation.

|-------------------+-----------+---------+----------------------|
| Method | Accuracy | Option | Scenario |
|-------------------+-----------+---------+----------------------|
| IRAM | Very high | --svd 0 | =N < 5000= |
| RSVD | High | --svd 1 | accuracy insensitive |
| Window-Based RSVD | Very high | --svd 2 | =M > 1,000,000= |
| Full SVD | Exact | --svd 3 | =N,M < 1000= |
|-------------------+-----------+---------+----------------------|
|--------------------------+-----------+----------------------|
| Method | Accuracy | Scenario |
|--------------------------+-----------+----------------------|
| IRAM (-d 0) | Very high | =N < 1000= |
| Window-Based RSVD (-d 2) | Very high | =M > 1,000,000= |
| RSVD (-d 1) | High | accuracy insensitive |
| Full SVD (-d 3) | Exact | cost insensitive |
|--------------------------+-----------+----------------------|

** Input formats

Expand Down Expand Up @@ -328,6 +329,12 @@ sequence of *M* blocks of *N x 4* bytes each, where *M* is the number of
variants and *N* is the number of samples. The first block corresponds to
the first marker in the =.kept.bim= file, etc.

*** LD r2

The LD r2 for pairwise SNPs within a window can be outputted to a file
with suffix =ld.gz= via =--print-r2= option. This file uses the same format
as the one [[https://www.cog-genomics.org/plink/1.9/ld#r][plink]] used.

** Memory-efficient modes

PCAone has both *in-core* and *out-of-core* mode for 3 different partial SVD
Expand Down Expand Up @@ -384,7 +391,7 @@ One should choose proper normalization method for specific type of data.

LD patterns vary across diverse ancestry and structured groups, and
conventional LD statistics, e.g. the implementation in =plink --ld=, failed to
model the LD in admixed populations. Thus, we proposed the so-called
model the LD in admixed populations. Thus, we can use the so-called
ancestry-adjusted LD statistics to account for population structure in
LD. See our [[https://doi.org/10.1101/2024.05.02.592187][paper]] for more details.

Expand All @@ -398,6 +405,19 @@ file with suffix =.residuals=.
./PCAone -b example/plink -k 3 --ld -o adj -m 4
#+end_src

** Report LD statistics

Currently, the LD r2 for pairwise SNPs within a window can be outputted via =--print-r2= option.

#+begin_src shell
./PCAone -B adj.residuals \
--ld-bim adj.kept.bim \
--ld-bp 1000000 \
--print-r2 \
-o adj
#+end_src


** Prunning based on Ancestry-Adjusted LD

Given the LD binary file =.residuals= and its associated variant file
Expand Down
2 changes: 1 addition & 1 deletion src/Cmd.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ Param::Param(int argc, char **argv) {
opts.add<Value<double>>("", "ld-r2", "r2 cutoff for LD-based pruning.", ld_r2, &ld_r2);
opts.add<Value<uint>>("", "ld-bp", "physical distance threshold in bases for ld pruning", ld_bp, &ld_bp);
opts.add<Value<int>>("", "ld-stats", "statistics to get r2 for LD. (0: the ancestry adjusted, 1: the standard)", ld_stats, &ld_stats);
opts.add<Switch>("", "print-r2", "print out r2 for pairse-wise SNPs within a ld-window", &print_r2);
opts.add<Switch>("", "print-r2", "print LD r2 for pairse-wise SNPs in a ld-window (saved in <prefix>.ld.gz)", &print_r2);
auto clumpfile = opts.add<Value<std::string>>("", "clump", "assoc-like file with target variants and pvalues for clumping", "", &clump);
auto assocnames = opts.add<Value<std::string>>("", "clump-names", "column names in assoc-like file for locating chr, pos and pvalue", "CHR,BP,P", &assoc_colnames);
opts.add<Value<double>>("", "clump-p1", "significance threshold for index SNPs", clump_p1, &clump_p1);
Expand Down

0 comments on commit 815c82c

Please sign in to comment.