illumnina
Pipelines to assist the analysis of illumina data
Usage
\\( vtools show pipeline illumina
A pipeline to handle illumina data prepared by CASAVA 1.8+. It imports
variants from SNPs.vcf and Indel.vcf of multiple samples, separate maxgt and
poly into different projects, calculate a few standard statistics and apply a
few filters. All results are saved as variant tools snapshots. This pipeline
uses command vtools so multi-processing is not supported.
Available pipelines: load_data
Pipeline "load_data": This pipeline accepts a list of directories under which
SNPs and Indels are listed in files Variations/SNPs.vcf and
Variations/Indels.vcf. It reads all variants and save the project to a
snapshot with raw data. It then removes MAXGT or POLY samples, rename samples,
merge SNP and Indels remove variants without any genotype in all samples,
create variant tables (all, SNVs and Indels) for each sample, and save results
to two other snapshots for maxgt and poly data respectively.
load_data_10: Load SNP and Indel variants from Variations/SNPs.vcf and
Variations/Indels.vcf under specified directory. Save
all inputted variants to the first snapshot file
specified by --output
load_data_20: Remove _POLY samples, merge SNPs and INDELs, remove
genotypes that does not pass filter (filter != "PASS"),
calculate genotype count of all variants, remove
variants without any genotype, and save results to the
second snapshot file specified by --output
load_data_30: Remove _POLY samples, merge SNPs and INDELs, remove
genotypes that does not pass filter (filter != "PASS"),
calculate genotype count of all variants, remove
variants without any genotype, and save results to the
second snapshot file specified by --output
Pipeline parameters:
geno_info Genotype information fields imported from VCF
files (default: filter qual DP_geno GQ_geno
PL_geno)
build Build of reference genome of the project.
(default: hg19)
Details
\\( vtools init test --force
\\( vtools execute illumina load_data --input /path/to/data/LP* \
--output raw_data.tar maxgt_data.tar poly_data.tar