Variant tools is a software tool for the manipulation, annotation, selection, simulation, and analysis of variants in the context of next-gen sequencing analysis. Unlike some other tools used for Next-Gen sequencing analysis, variant tools is project based and provides a whole set of tools to manipulate and analyze genetic variants. Please refer to what you can do with variant tools for a list of features provided by variant tools.
temp_dir
. Manhattan and QQ plot engine is also updated to work with ggplot2 version 1.0.0 (backward compatibility is dropped).genotype()
and samples()
SQL function, and the --as
option to command vtools use
.vtools admin --update_resource
and vtools_report sequences
, and allows the use of arbitrary characters for names of variant tables.vtools associate
, vtools update
, vtools phenotype
and vtools_report
commands.vtools associate
and vtools admin
, with more than 20 association tests implemented under a unified association test framework.vtools output
and vtools export
when range-based annotation databases are used. All users are recommended to upgrade.vtools output
and vtools export
when range-based annotation databases are used. All users are recommended to upgrade.--jobs
to a number of vtools commands and allow them to execute in multiple threads or processes. User interface is further cleaned for the final 1.0 release. As a result, support for the MySQL backend is temporarily disabled.--children
for command vtools init
, which allows the creation of a project by merging multiple subprojects.vtools export
command that can export in ANNOVAR and VCF formats.vtools import
command.If you have used other sequencing or association analysis tools such as bedtools and pseq, you will be surprised that variant tools usually does not give you a nice report with a list of variants or genes with some useful information after performing an analysis. Instead, all the information, including results of analysis, are saved in the project in a consistent manner. An extra step is needed to output the information you need. In other words, the management and presentation of information regarding variants are two different processes, and you typically add more and more information to your project during analysis of your data. The end result is that you have immediate access to a large amount of information for the variants you are interested in, which can in turn help you perform more in-depth analysis. Using a fabricated and unusually long command,
% vtools output # 2
myvariants # 1
chr pos ref alt # 3
hom_case hom_ctrl # 4
dbNSFP.SIFT_score dbSNP.name refGene.name2 # 5
asso1.p_value asso2.p_value # 6
"ref_sequence(chr, pos - 5, pos + 5)" # 7
"track('LP056A.BAM')" # 8
"genotype('WGS1')" # 9
"samples()"
myvariants
contains a list of variants, which is a subset of the master variant table (all the variant of your project) and is typically created using command vtools select
.vtools output
output information for all variants in myvariant
, which includechr
, pos
, ref
, alt
constitute a variant, namely location and type of a mutation.hom_case
and hom_ctrl
are number of homozygous genotypes of this variant in cases and controls. These are called variant info fields and are added to the project using command vtools update --from_stat
dbNSFP.SIFT_score
, dbSNP.name
and refGene.name2
are annotation information from different annotation databases. Annotation databases are not part of the project. They are connected to the project using command vtools use
.asso1.p_value
and asso2.p_value
are results of two different association analysis. These are annotation databases created by command vtools associate
.ref_sequence
is a function provided by variant tools to retrieve the reference sequence around the variant. Here 5 basepair of the up and downstream of each variant is returned.track
is a function to extract information from external files. In this example, the depth of coverage at the location of the variant in the specified bam file is returned.genotype
is another function to get the genotype of this variant in sample WGS1
, for example, 1 for heterozygote and 2 for homozygote. Function samples()
lists the samples that contain the variant.As you can see, individual commands such as vtools use
and vtools update
do not produce any output, but adds information to the project that can be displayed along with others. Then, it is important to remember that all such information can be used to select, prioritize, and analyze your variants. Another fabricated command would look like
% vtools select # 1
myvariants # 2
"refGene.name2='BRCA1'" # 3
'dbNSFP.SIFT_score > 0.95'
'hom_case > 15' 'hom_ctrl = 0' # 4
'asso1.p_value < 0.05 OR asso2.p_value < 0.05' # 5
--to_table significant # 6
vtools select
selects variants according to their propertiesmyvariants
, which was itself selected using some other crieteria,BRCA1
and must have > 0.95 SIFT scores (probably damagin)significant
.In summary, variant tools is NOT designed to be a black-box tool that analyzes your data and generates a nice-looking report with a list of candidate variants or genes. It is a platform under which you can analyze your data using several methods, compare and analyze results, re-compare and re-analyze, and again using different methods or annotation sources, based on the information abtained from your previous analyses.** The unique advantage of *variant tools* is that you generally do not need to write a bunch of scripts to *connect* input output of different tools and *parse* and *compare* results in different formats, and you have easy access to a huge amount of information that help you select, prioritize and analyze your variants, all from your command line. However, because of the uniqueness of this design, **please read through the Concepts section of this website before using *variant tools*.**
Catagory | Tasks |
---|---|
Variant calling | Call variants from raw reads in FASTQ or BAM (convert to FASTQ first) formats using the GATK best practice pipeline. |
Import variants | Import variants and genotypes in VCF format, with options to import specified variant and genotype info fields. |
Import all info and genotype fields, including customized fields from VCF files. | |
Import SNP and Indel variants from the Illumina CASAVA pipeline before version 1.8 (text files), and variants called from the Complete Genomics pipeline. | |
Pipeline to import variants from the recent versions of the Illumina CASAVA pipeline (in VCF format) that provides variant calls from two probabilistic models. | |
Import variants in text or CSV files. | |
Import variants from files in Plink format. | |
Import variants from a list of rsnames (dbSNP IDs), or just chromsome and positions, variant information are retrieved from the dbSNP database. | |
Import data in arbitrary format by defining customized format-description file. | |
Reference genome | Native support for build hg18 and hg19 of the human genome, and other genomes such as the mouse genome. Reference genomes of the human genomes are downloaded automatically when they are used. |
Variants in different reference genomes can be imported and analyzed together, through automatic mapping between primary and alternative reference genomes. | |
Supports the use of annotations in a different reference genome by mapping genomic coordinates across reference genomes. | |
Easily retrieve reference sequences around variant sites through function ref_sequence . This allows you to check if variants are in, for example, mononucleotide or short-tandem repeat sequences. |
|
Validate the build of reference genome if you are uncertain about the reference genome used in the data. | |
Variant annotation | Standardize annotations from different sources so that you do not have to worry about inconsistencies between the use of chromosome names (with/without leading chr ), genomic positions (0- or 1-based) and other nomenclatures. |
Annotations are automatically downloaded from online repository, or build from source if needed. Annotation databases are automatically updated although you can use a prior version, or use different versions of the same annotation database at the same time. | |
Detailed descriptions of available annotation databases are readily available from command vtools show annotation . |
|
Supports CCDS, Entrez, Known Gene, and ref seq definitions of genes, which allow you to identify variants in genes, exon regions, or upstream/downstream of these genes. | |
Standardize gene names through the use of HUGO Gene Nomenclature Committee approved gene names. | |
Identify variants in Catalogue of Somatic Mutations in Cancer or within Database of Genomic Variants. | |
Identify variants in all versions of dbSNP databases, Exome Sequencing project, the thousand genomes project, and the HapMap project. | |
Annotate variants with SIFT, PolyPhen, MutationTaster and many other prediction scores from dbNSFP. | |
Check for variants that are in the GWAS Catalog database, or variants that are within certain range of GWAS hits. | |
Identify variants in highly conserved regions through the phastCons database, or variants in genomic duplication regions. | |
Pipelines to automatically annotate variants using ANNOVAR and snpEff. | |
Allow the creation of annotation databases from your own data in vcf format. | |
Convert variants in a variant tools project to an annotation database to be used by another project, or convert an annotation database to a project for detailed analysis. | |
Users can define and create their own annotation databases through [Annotation/New/customized annotation description files]. | |
External Annotation | Retrieve calls, reads, quality, and coverage information from BAM files, filtered by quality score, strand, type, or flags, and use such information to select variants. This provides a command line alternative to IGV to check raw reads for called variants. |
Retreive variant info and genotype information from local or online tabix indexed vcf.gz files, this allows you, for example, to obtain variant info from vcf files on the 1000 genomes website. | |
Retrive annotation from bigWig or bigBed files, from the ENCODE project. | |
Samples and Phenotypes | Import and keep track of samples using filename and sample names. |
Rename samples and merge genotypes from multiple input files. | |
Arbitrary sample information such as sex, BMI, and ethnicity can be saved as phenotype and used for sample selection or association analysis. | |
Calculation of number of genotypes, alternative alleles, homozygotes, heterzygotes and other types of genotypes in all or subset of samples. | |
Calculate minimal, maximum, average values of genotype info (e.g. quality score) across all or selected samples for each variant. | |
Variant Selection | Use sample statistics to select, for example, homozygous variants with acceptable quality that appear only in cases. |
Select variants based on their membership in annotation databases such as dbSNP and thousand genomes project. | |
Select variants from multiple conditions that involves multiple variant and annotation info fields (e.g. SIFT score). | |
Variants selected by different criteria are kept in multiple variant tables, with meta information. | |
Compare variant tables and examine differences between two or more variant tables. | |
Identify De Novo mutations from family based samples, identify variants that share the same sites with an existing set of variants. | |
Pipelines to identify de novo or recessive mutations that might cause the phenotype of an affected offspring in a family of unaffected parents. | |
Output variants | Output a large number of variant info and annotation fields across different annotation databases altogether. |
Output expressions of variant info and annotation fields, including vtools-specific SQL functions. | |
Output reference sequence around variant site, genotypes of one or more samples, and samples that harbor the variants. | |
Output summary statistics (e.g. count, average) of variants and variant info fields, grouped by specified fields. | |
Export variants | Export variants in vcf format, with variant and annotation info, and genotypes. |
Export variants in other formats such as ANNOVAR and Plink to be analyzed by these programs. | |
Export variants with variant info and annotation fields in csv format. | |
Association analysis | Use more than 20 association analysis methods to associate variants and genes with qualitative or quantitative traits. |
Execute multiple association tests across the genome using multiple processes. | |
Results of association analyses are saved as annotation databases and are used to annotate individual variants, regardless of groups used to analyze data. | |
Draw manhantan and other figures from association test results. | |
Perform meta analysis from association test results. | |
Reports | Print reference sequences for particular regions, or gene, exome etc. |
Calculate discordance rate between samples. | |
Calculate average depth of coverage, number of SNPs and Indels for all or selected samples. | |
Calculate transition transversion ratio for all or selected variants. | |
Scatter, box plot, histgram plots for variant info fields, genotype info fields, and phenotypes. | |
Data Management | A project can be saved, transferred and loaded easily as snapshots. A number of online snapshots are provided for learning purposes. |
Remove genotypes based on different criteria (e.g. quality score), or remove variants in a variant table. | |
Merge data from several sub projects (e.g. adding data from different batches). | |
Split project into sub projects to focus on particular sets of variants or samples. | |
A resource management system to download and update resources on demand, or in batch. | |
Please refer to a list of tutorials to get started. |
Please cite
F. Anthony San Lucas, Gao Wang, Paul Scheet, and Bo Peng (2012) Integrated annotation and analysis of genetic variants from next-generation sequencing studies with variant tools, Bioinformatics 28 (3): 421-422.
for Variant Tools and
Gao Wang, Bo Peng and Suzanne M. Leal (2014) Variant Association Tools for Quality Control and Analysis of Large-Scale Sequence and Genotyping Array Data, The American Journal of Human Genetics 94 (5): 770–83.
for Variant Association Tools, and
Bo Peng (2014) Reproducible Simulations of Realistic Samples for Next-Generation Sequencing Studies Using Variant Simulation Tools, Genetic Epidemiology.
for Variant Simulation Tools if you find variant tools helpful and use it in your publication. Thank you.