This pipeline is an extension to command vtools_report transmission
, the differences are
vtools_report transmission
find recessive and de novo mutations and create variant tables for each offspring. This pipeline assumes the first offspring is the affected one and remove de novo and recessive variants of the second offspring from the list. It creates a single variant table for the results.
This pipeline creates a bunch of variant tables for variants that are in different annotation databases.
% vtools show pipeline transmission
Pipelines to detect different types of variants that are transmitted from parents to offspring.
Available pipelines: denovo, recessive
Pipeline "denovo": This pipeline identifies de novo mutations from a family of unaffected parents,
affected offspring, and optional unaffected siblings. It can be applied either to the current project
(no --input is specified), or a snapshot (--input) for which the snapshot will be loaded and
overwrite the existing project. The parameter --parents and --offspring are required to specify the
name of parents, proband (affected offspring), and one optional sibling. Parameter --name is
recommended to give all variant tables a prefix. This pipeline will produce tables \\(name_denovo
(variants that are observed only in the proband), A table \\(name_denovo_SNP will be created with all
SNP markers in table \\(name_denovo. And, depending on values of parameter --databases, it can produce
tables \\(table_1kg for variants in 1000 genomes project, \\(table_dbSNP for variants in dbSNP project,
and \\(table_refGene, \\(table_refGene_exon, \\(table_ccdsGene, \\(table_ccdsGene_exon,
\\(table_CancerGenomeCensus, \\(table_COSMIC, \\(table_dbNSFP, \\(table_phastCons, \\(table_phastConsElements,
\\(table_genomicSuperDups for tables in respective annotation databases. It is up to you to select
variants based on these membership tables using the 'vtools compare' command. The project will be
saved to a snapshot if a name (or filename with extension .tar or .tar.gz) is specified as the
output.
denovo_0: Load specified snapshot if a snapshot is specified. Otherwise use the existing
project.
denovo_5: Check the version of variant tools (version 2.2.1 and above is required to
execute this pipeline)
denovo_10: Import all annotation databases
denovo_20: Locate de novo variants of the proband
denovo_50: Create variant tables according to their membership in different annotation
databases
denovo_100: Save the project to a snapshot if an output is specified.
denovo_200: Summarize the results.
Pipeline "recessive": This pipeline identifies recessive mutations from a family of unaffected
parents, affected offspring, and optional unaffected siblings. Recessive variant is defined as
variants that are homozygous in the affected offspring (proband), heterozygous in both parents, and
heterozygous or wildtype in a sibling (if available). The pipeline can be applied either to the
current project (no --input is specified), or a snapshot (--input) for which the snapshot will be
loaded and overwrite the existing project. The parameter --parents and --offspring are required to
specify the name of parents, proband (affected offspring), and one optional sibling. Parameter --name
is recommended to give all variant tables a prefix. This pipeline will produce tables \\(name_recessive
(variants that are observed only in the proband). A table \\(name_denovo_SNP will be created with all
SNP markers in table \\(name_denovo. And, depending on values of parameter --databases, it can produce
tables \\(table_1kg for variants in 1000 genomes project, \\(table_dbSNP for variants in dbSNP project,
and \\(table_refGene, \\(table_refGene_exon, \\(table_ccdsGene, \\(table_ccdsGene_exon,
\\(table_CancerGenomeCensus, \\(table_COSMIC, \\(table_dbNSFP, \\(table_phastCons, \\(table_phastConsElements,
\\(table_genomicSuperDups for tables in respective annotation databases. It is up to you to select
variants based on these membership tables using the 'vtools compare' command. Two optional output
files are allowed. The project will be saved to a snapshot if a name (or filename with extension .tar
or .tar.gz) is specified as the output.
recessive_0: Load specified snapshot if a snapshot is specified. Otherwise use the existing
project.
recessive_5: Check the version of variant tools (version 2.2.1 and above is required to
execute this pipeline)
recessive_10: Import all annotation databases
recessive_20: Locate recessive variants of the proband (homozygous only in proband) and save
variants in table \\(name_recessive
recessive_50: Create variant tables according to their membership in different annotation
databases
recessive_100: Save the project to a snapshot if an output is specified.
recessive_200: Summarize the results.
Pipeline parameters:
parents
offspring
name Name of the family. All generated tables will be prefixed with this name.
(default: family)
databases Databases for which membership tables will be produced. (default: thousandGenom
es,dbSNP,refGene,ccdsGene,refGene_exon,ccdsGene_exon,CosmicCodingMuts,CosmicNon
CodingVariants,dbNSFP,phastCons,phastConsElements,genomicSuperDups)
This pipeline executes a series of vtools commands to identify de novo variants in a family with affected offsprng, unaffected parents, and an optional unaffected sibling.
The pipeline either applies to the existing project, or load a snapshot if a snapshot is specified using parameter --input
. For a project with two unaffected parents, affected offspring (proband), and an optional sibling, this pipeline
$name_denovo
$name_denovo_site
The pipeline writes a summary of tables created to the standard output, and save the project to a snapshot if a name or filename is assigned to parameter --output
.
For example, the following command
% vtools execute transmission denovo --input poly_data.tar \
--parents WGS3_2 WGS3_3 --offspring WGS3_1 --output denovo.tar \
> logfile
produces a log file
% cat logfile
SUMMARY: Identification of de novo variants for family family
Members: WGS3_2 WGS3_3 (unaffected parents), WGS3_1 (affected offspring)
Number of genotypes:
WGS3_2 : 4367814
WGS3_3 : 4455890
WGS3_1 : 4343418
de novo variants:
family_denovo : 113553 (de novo variants for family family )
family_denovo_SNP: 63578 (de novo SNP variants for family family )
Database membership:
family_denovo_in_thousandGenomes: 18330 (de novo variants in database thousandGenomes)
family_denovo_in_dbSNP: 71921 (de novo variants in database dbSNP)
family_denovo_in_refGene: 40037 (de novo variants in database refGene)
family_denovo_in_ccdsGene: 28427 (de novo variants in database ccdsGene)
family_denovo_in_refGene_exon: 1099 (de novo variants in database refGene_exon)
family_denovo_in_ccdsGene_exon: 235 (de novo variants in database ccdsGene_exon)
family_denovo_in_CosmicCodingMuts: 73 (de novo variants in database CosmicCodingMuts)
family_denovo_in_CosmicNonCodingVariants: 111 (de novo variants in database CosmicNonCodingVariants)
family_denovo_in_dbNSFP: 148 (de novo variants in database dbNSFP)
family_denovo_in_phastCons: 101916 (de novo variants in database phastCons)
family_denovo_in_phastConsElements: 3502 (de novo variants in database phastConsElements)
family_denovo_in_genomicSuperDups: 24836 (de novo variants in database genomicSuperDups)
This pipeline works similarly to the denovo
pipeline (with the same input, output and other options), but tried to identify variants that are recessive in the affected offspring, heterozygous in parents, and wildtype or heterozygous in the unaffected sibling, if available.
Variants on sex chromosomes are handled in the same way as variants on autosomes. There must be some genotyping error if you observe recessive variants on chromosome Y. If you observe recessive variants on chromosome X, it means the variant is heterozygous for mother, and exists in father.
The pipelines identify recessive or de novo variants and create a bunch of tables. You usually should filter the list more using combination of memberships, quality scores, and other information. For example, if you are looking for novel variants that are not in 1000 genomes, in exon regions, with high conservation score, not in genomic duplication regions, you can select the variants using command
% vtools compare --expression 'mylist=family_denovo - family_denovo_in_thousandGenomes - \
(family_denovo - (family_denovo_in_refGene_exon | family_denovo_phastConsElements)) - \
family_denovo_genomicSuperDups'
and start looking closely at these variants, using commands such as
% vtools output mylist chr pos ref alt 'ref_sequence(chr, pos, pos+20)' "track('mydata.bam', 'reads')"