The data-adaptive sum test (aSum) by (Han and Pan, 2010) is the first method that took into consideration the difference in direction of effects (protective or deleterious) of rare variants in the same genetic region analyzed by a rare variant association test. It is a two-stage approach. In the first stage the effect size of each rare variant is evaluated in a multivariate regression analysis, identifying the variants having significant “protective” effects, i.e., variants with a negative log odds ratio associated with a \(p\) value smaller than \(0.1\). In the second stage, variants are collapsed across the genetic region similar to (Morris and Zeggini, 2010) but with the coding for protective variants flipped. The test statistic is a score test for logistic regression for case control data.
The implementation of stage 1 in this program differs from the original paper. Instead of evaluating the effect size for each variant, it evaluates the difference in MAF of each variant between case and controls via an exact test to determine which variants are to be re-coded in stage 2. The same \(p<0.1\) criteria is used for stage 1, but in effect is more stringent than the original criteria for a multivariate logistic regression analysis.
vtools show test aSum
Name: aSum
Description: Adaptive Sum score test for protective and deleterious variants, Han &
Pan 2010
usage: vtools associate --method aSum [-h] [--name NAME] [-q1 MAFUPPER]
[-q2 MAFLOWER] [-p N] [--adaptive C]
Adaptive Sum score test for protective and deleterious variants, Han & Pan
2010. In the first stage of the test, each variant site are evaluated for
excess of minor alleles in controls and genotype codings are flipped, and the
second stage performs a burden test similar to BRV (Morris & Zeggini 2009).
This two-stage test is robust to a mixture of protective/risk variants within
one gene, yet is computationally intensive. aSum test is a two-tailed test.
optional arguments:
-h, --help show this help message and exit
--name NAME Name of the test that will be appended to names of
output fields, usually used to differentiate output of
different tests, or the same test with different
parameters.
-q1 MAFUPPER, --mafupper MAFUPPER
Minor allele frequency upper limit. All variants
having sample MAF<=m1 will be included in analysis.
Default set to 0.01
-q2 MAFLOWER, --maflower MAFLOWER
Minor allele frequency lower limit. All variants
having sample MAF>m2 will be included in analysis.
Default set to 0.0
-p N, --permutations N
Number of permutations
--adaptive C Adaptive permutation using Edwin Wilson 95 percent
confidence interval for binomial distribution. The
program will compute a p-value every 1000 permutations
and compare the lower bound of the 95 percent CI of
p-value against "C", and quit permutations with the
p-value if it is larger than "C". It is recommended to
specify a "C" that is slightly larger than the
significance level for the study. To disable the
adaptive procedure, set C=1. Default is C=0.1
vtools associate rare status -m "aSum --name aSum -p 5000" --group_by name2 --to_db asum -j\
8 > asum.txt
INFO: 3180 samples are found
INFO: 2632 groups are found
INFO: Starting 8 processes to load genotypes
Loading genotypes: 100% [=================================] 3,180 32.6/s in 00:01:37
Testing for association: 100% [=========================================] 2,632/591 10.3/s in 00:04:14
INFO: Association tests on 2632 groups have completed. 591 failed.
INFO: Using annotation DB asum in project test.
INFO: Annotation database used to record results of association tests. Created on Wed, 30 Jan 2013 16:32:32
vtools show fields | grep asum
asum.name2 name2
asum.sample_size_aSum sample size
asum.num_variants_aSum number of variants in each group (adjusted for specified MAF
asum.total_mac_aSum total minor allele counts in a group (adjusted for MOI)
asum.statistic_aSum test statistic.
asum.pvalue_aSum p-value
asum.std_error_aSum Empirical estimate of the standard deviation of statistic
asum.num_permutations_aSum number of permutations at which p-value is evaluated
head asum.txt
name2 sample_size_aSum num_variants_aSum total_mac_aSum statistic_aSum pvalue_aSum std_error_aSum num_permutations_aSum
AADACL4 3180 5 138 2.59057 0.32967 3.85368 1000
ABCG5 3180 6 87 1.90472 0.335664 3.00098 1000
ABCD3 3180 3 42 -0.873585 0.635365 2.17424 1000
ABCB6 3180 7 151 -0.521698 0.632368 3.97958 1000
ABHD1 3180 5 29 -0.365094 0.548452 1.81627 1000
ABCG8 3180 12 152 -5.63774 0.95005 4.06417 1000
ABL2 3180 4 41 0.242453 0.565435 1.98108 1000
ACADL 3180 5 65 0.457547 0.58042 3.00258 1000
ACAP3 3180 3 17 0.0273585 0.404595 1.26823 1000
Fang Han and Wei Pan (2010) A Data-Adaptive Sum Test for Disease Association with Multiple Common or Rare Variants. Human Heredity doi:10.1159/000288704
. http://www.karger.com/doi/10.1159/000288704
Andrew P. Morris and Eleftheria Zeggini (2010) An evaluation of statistical approaches to rare variant analysis in genetic association studies. Genetic Epidemiology doi:10.1002/gepi.20450
. http://doi.wiley.com/10.1002/gepi.20450