% vtools init -h
usage: vtools init [-h] [-f] [--parent DIR_or_SNAPSHOT] [--variants [TABLE]]
[--samples [COND [COND ...]]]
[--genotypes [COND [COND ...]]]
[--children DIR_OR_SNAPSHOT [DIR_OR_SNAPSHOT ...]]
[-v {0,1,2}]
project
Create a new project in the current directory. This command will fail if
another project already exists in this directory, unless option '--force' is
used to remove the existing project.
positional arguments:
project Name of a new project. This will create a new .proj
file under the current directory. Only one project is
allowed in a directory.
optional arguments:
-h, --help show this help message and exit
-f, --force Remove a project if it already exists.
-v {0,1,2}, --verbosity {0,1,2}
Output error and warning (0), info (1) and debug (2)
information to standard output (default to 1).
Derive from a parent project:
--parent DIR_or_SNAPSHOT
Directory or snapshot file of a parent project (e.g.
--parent ../main) from which all or part of variants
(--variants), samples (--samples) and genotypes
(--genotypes) will be copied to the newly created
project.
--variants [TABLE] A variant table of the parental project whose variants
will be copied to the new project. Default to variant
(all variants).
--samples [COND [COND ...]]
Copy only samples of the parental project that match
specified conditions.
--genotypes [COND [COND ...]]
Copy only genotypes that match specified conditions.
Merge from children projects:
--children DIR_OR_SNAPSHOT [DIR_OR_SNAPSHOT ...]
A list of a subprojects (directories or snapshot files
of projects) that will be merged to create this new
project. The subprojects must have the same primary
and alternative reference genome. Variant tables with
the same names from multiple samples will be merged.
Samples from the children projects will be copied even
if they were identical samples imported from the same
source files.
Command vtools init
creates a new project in the current directory. The project will be empty unless it is a child project from a parent project (with a subset of samples, variants etc), or a parent project of several children projects (with merged variants and samples).
A directory can only have one project. After a project is created, subsequent vtools
calls will automatically load the project in the current directory. Working from outside of a project directory is not allowed.
A variant tools project $name
consists of a project file $name.proj
, a genotype database (HDF5 or SQLite), and a log file $name.log
. In addition to the project databases under the project directory, variant tools will store
~/.variant_tools
. You can use command “vtools admin --set_runtime_option local_resource
” to relocate this directory if your home directory does not have enough free space./tmp
. If your temp partition is not large enough, you can set runtime option temp_dir
to another directory. This directory will be cleared automatically after a project is closed, so do not point it to a directory with existing files.Command `vtools init NAME
creates a new project NAME
under the current directory. It will fail if there is already a project in the current directory, unless option --force
is used to remove any existing project.
% mkdir myproj
% cd myproj
% vtools init myproj
INFO: variant tools 3.0.0dev : Copyright (c) 2011 - 2016 Bo Peng
INFO: Please visit http://varianttools.sourceforge.net for more information.
INFO: Creating a new project myproj
If you attempt to create another project in the same directory, command `vtools init
will fail with an error message:
ERROR: A project can only be created in a directory without another project.
Using the --force
option will remove the existing project and create a new one:
% vtools init test --force
INFO: variant tools 3.0.0dev : Copyright (c) 2011 - 2016 Bo Peng
INFO: Please visit http://varianttools.sourceforge.net for more information.
INFO: Creating a new project test
If you are worried about losing your work by accidentally calling vtools init
with option --force
, you could save copies of your project using command `vtools admin --save_snapshot
from time to time, and load a saved snapshot using command `vtools admin --load_snapshot
when needed.
A project could be created from a parent project with a subset of its variants and samples. For example, a child project with variants from a small chromosomal region could be created from a parent project to test a pipeline before it is applied to the whole project. This also allows differential analysis of subsets of variants (e.g. SNVs and indels) and samples.
% vtools admin --load_snapshot vt_quickStartGuide_v3
Downloading snapshot vt_quickStartGuide_v3.tar.gz from online repository
Extracting vt_quickStartGuide_v3: 100% [===================================] 148,585 20.1M/s in 00:00:00
INFO: Snapshot vt_quickStartGuide_v3 has been loaded
This project has variants from two samples and a single master variant table with 4,839 variants:
% vtools show samples
sample_name filename
CEU CEU_hg38_all.vcf
JPT JPT_hg38_all.vcf
% vtools show tables
table #variants date message
variant 4,839 May30 Master variant table
Variants from the CEU and JPT samples could be selected to separate variant tables using commands
% vtools select variant --samples "sample_name=='CEU'" -t CEU
INFO: 3470 variants selected.
% vtools select variant --samples "sample_name=='JPT'" -t JPT
INFO: 2878 variants selected.
The project now has three variant tables
% vtools show tables
table #variants date message
CEU 3,470 May30
JPT 2,878 May30
variant 4,839 May30 Master variant table
The following filters could be applied to the parent project if the STOREMODE is set to SQLite.
--variants
Only variants from the specified variant table of the parent project will be copied. Genotypes of samples will be affected because only genotypes related to these variants will be copied.--samples
Only samples matching specified conditions (e.g. sample names) will be copied.--genotypes
Only genotypes matching specified conditions (e.g. with quality score above certain threshold) will be copied.You can create a subproject with variants from the CEU:
% export STOREMODE="sqlite"
% vtools init -f test
% vtools admin --load_snapshot vt_quickStartGuide
% vtools select variant --samples 'sample_name == "CEU"' -t CEU 'Variants from CEU population'
% vtools select variant --samples 'sample_name == "JPT"' -t JPT 'Variants from JPT population'
% mkdir ../CEU
% cd ../CEU
% vtools init CEU --parent ../myproj --variants CEU
INFO: variant tools 3.0.0dev : Copyright (c) 2011 - 2016 Bo Peng
INFO: Please visit http://varianttools.sourceforge.net for more information.
INFO: Creating a new project CEU
Copying variant tables ../myproj/test.proj: 100% [============================] 6 251.6/s in 00:00:00
Copying samples: 100% [=======================================================] 2 211.4/s in 00:00:00
INFO: 3489 variants and 2 samples are copied
The new project has a master variant table with 3,489 variants:
% vtools show tables
table #variants date message
CEU 3,489 May14 Variants from CEU population
JPT 1,531 May14 Variants from JPT population
variant 3,489
and two samples (with subsets of variants):
% vtools show genotypes
sample_name filename num_genotypes sample_genotype_fields
CEU CEU.exon.2010_03.sites.vcf.gz 3489
JPT JPT.exon.2010_03.sites.vcf.gz 1531
You can create another project with variants in the JPT population, and only the JPT sample:
% mkdir ../JPT
% cd ../JPT
% vtools init JPT --parent ../myproj --variants JPT --samples 'sample_name == "JPT"'
INFO: variant tools 3.0.0dev : Copyright (c) 2011 - 2016 Bo Peng
INFO: Please visit http://varianttools.sourceforge.net for more information.
INFO: Creating a new project JPT
Copying variant tables ../myproj/test.proj: 100% [==================================] 6 214.4/s in 00:00:00
Copying samples: 100% [=============================================================] 1 111.3/s in 00:00:00
INFO: 4858 variants and 1 samples are copied
The new JPT project has a master variant table with 2,900 variants,
% vtools show tables
table #variants date message
CEU 1,531 May14 Variants from CEU population
JPT 2,900 May14 Variants from JPT population
variant 2,900
and a single sample JPT:
% vtools show samples
sample_name filename
JPT JPT.exon...3.sites.vcf.gz
if you use –-samples
(or –-genotypes
) options without –-variants
option to create a subproject, all of the variant tables in the parent project will be copied into your subproject, and the specified samples or genotypes will be copied to your subproject.
The parent project does not have to be a directory. It can also be a local or online snapshot. For example, command
% mkdir ../test
% cd ../test
% vtools init test --parent vt_simple
INFO: variant tools 3.0.0dev : Copyright (c) 2011 - 2016 Bo Peng
INFO: Please visit http://varianttools.sourceforge.net for more information.
INFO: Creating a new project test
INFO: Extracting snapshot vt_simple to .
Downloading snapshot vt_simple.tar.gz from online repository
Extracting vt_simple: 100% [====================================================] 41,325 5.4M/s in 00:00:00
will download snapshot vt_simple
from online and become the present project. In addition, if you have a snapshot file, you can use command
% vtools init test --parent my_snapshot.tar.gz
which is a shortcut to commands
% vtools init test -f
% vtools admin --load_snapshot my_snapshot.tar.gz
A project could also be created from one or more children projects if the STOREMODE is set to sqlite. This allows flexible handling of batches of data (e.g. analyze data separately or jointly), and parallel processing of large datasets (e.g. split a project by chromosomes, analyze them separately, and combine the results).
Merging two or more variant tools projects will merge variants and samples from these projects. More specifically,
NULL
values for these fields.Because subprojects might have overlapping variants, variant tables, and samples, merging subproject might lead to unexpected results.
Variant tables from children projects will be copied to $name (from \\(proj)
before they are merged. This allows you to keep track of information from the original projects, or compare tables from children projects.
% mkdir ../merged
% cd ../merged
% vtools init merged --children ../CEU ../JPT
INFO: variant tools 3.0.0dev : Copyright (c) 2011 - 2016 Bo Peng
INFO: Please visit http://varianttools.sourceforge.net for more information.
INFO: Creating a new project merged
Loading ../CEU/CEU (1/2): 0 0.0/s in 00:00:00
Loading ../JPT/JPT (2/2): 0 0.0/s in 00:00:00
Merging all projects: 100% [========================================================] 22 21.9/s in 00:00:01
we will see that variants from these two projects are corrected merged
% vtools show tables
table #variants date message
CEU 3,489 May14 Variants from CEU population (merged)
CEU (from CEU) 3,489 May14 Variants from CEU population (from CEU)
CEU (from JPT) 1,531 May14 Variants from CEU population (from JPT)
JPT 2,900 May14 Variants from JPT population (merged)
JPT (from CEU) 1,531 May14 Variants from JPT population (from CEU)
JPT (from JPT) 2,900 May14 Variants from JPT population (from JPT)
variant 4,858 May14 (merged)
variant (from CEU) 3,489 (from CEU)
variant (from JPT) 1,369 (from JPT)
but we have three samples with different number of variants
% vtools show genotypes
sample_name filename num_genotypes sample_genotype_fields
CEU CEU.exon...3.sites.vcf.gz 3489
JPT JPT.exon...3.sites.vcf.gz 1531
JPT JPT.exon...3.sites.vcf.gz 2900
Because the latter two samples have the same name, it is even difficult to remove one of them using command vtools remove samples
. If you have to merge samples with the same names from different projects, it is recommended that you use command vtools admin --rename_samples
to change names of samples before merging, and remove duplicated samples afterwards.
If you have a large number of samples from different sources, it is a good idea to create subprojects for groups of samples. Merging subprojects will be faster than reading from source files again. However, due to the overhead of re-mapping all variants, pre-processing each sample by creating its own project usually does not help much.
The children projects can also be snapshots, so if you have snapshots of a number of children projects, you can create a merged project using command
% vtools init test --children max_gt_data.tar poly_data.tar
The snapshots can also be an online snapshot so you can use an online snapshot to start your new project:
% vtools init test --children vt_simple