Once you have imported variants and samples of an exome or whole-genome sequencing project, you are typically facing a master variant table with millions of variants. Depending on the phenotypes of interest and particular research topics, you might want to filter variants for subsequent association analysis, or select variants that are more informative than others for further lab-based analysis. For example, you might be interested in
Variants that satisfies these conditions can be selected from either project-specific properties (sample statistics, quality score etc) or annotation databases (dbSNP or 1000 genome project membership, gene regions etc). Required variant info and annotation fields could be prepared by
Please refer to relevant documentation pages for details.
After you have required variant info and annotation fields in place, you have conceptually a huge table with variant as rows and variant info and annotation fields as columns. It is then relatively easy to use command vtools select
or vtools exclude
to select variants of interest. For example, if you have a field num_case
as number of alternative alleles in the cases (affected individuals), num_control
as number of alternative alleles in controls (unaffected individuals), SIFT_score
from database [dbNSFP][4], you could use condition
'num_case>0' 'num_control=0' 'SIFT_score>0.95' 'dbSNP.chr is NULL'
to select variants that are available in cases (num_case>0
), not in control (num_control=0
), predicted to be damaging (SIFT_score>0.95
), and is not in dbSNP (dbSNP.chr is NULL
). The last condition looks a bit strange but it merely means that there is no value (NULL
) for field dbSNP.chr
in the large virtual table we have imagined, meaning variants that are not in the dbSNP
database.
Selected variants could be outputted directly but they are more frequently saved to a separate variant table for future reference. Many variant tables could be created based on different selection criteria, and you could use any of these tables whenever a variant table is needed (e.g. after commands vtools select
, vtools exclude
, vtools output
, and vtools export
).
For more information on the use of these commands, please refer to the following tutorials:
[3]: /documentation/ /applications/annotation/ [4]: /applications/annotation/variants/dbnsfp/ [6]: /documentation/tutorials/case44ctrl20/ [8]: /documentation/tutorials/illumina5/ [10]: /documentation/tutorials/compare/ [12]: /documentation/tutorials/select/