% vtools remove -h
usage: vtools remove [-h] [-v STD[LOG]]
{project,tables,samples,fields,geno_fields,annotations,variants,genotypes,phenotypes}
[items [items ...]]
Remove from the current project various items such as variants genotypes, and
annotation fields.
positional arguments:
{project,tables,samples,fields,geno_fields,annotations,variants,genotypes,phenotypes}
Type of items to be removed.
items Items to be removed, which should be, for 'project'
the name of project to be removed (optional), for
'tables' names of one or more variant tables, for
'samples' patterns using which matching samples are
removed, for 'fields' name of fields to be removed,
for 'geno_fields' name of genotype fields to be
removed (cf. 'vtools show genotypes'), for
'annotations' names of annotation databases, for
'variants' variant tables whose variants will be
removed from all variant tables and genotypes, for
'genotypes' conditions using which matching genotypes
are removed, and for 'phenotypes' columns in the
output of 'vtools show samples'. Note that removal of
samples will only remove sample name, filename (if all
related samples are removed), and related genotypes,
but not variants themselves; removal of annotation
databases will stop using these databases in the
project, but will not delete them from disk.
optional arguments:
-h, --help show this help message and exit
-v STD[LOG], --verbosity STD[LOG]
Output error and warning (0), info (1) and debug (2)
information to standard output (default to 1), and to
a logfile (default to 2).
Fields from annotation databases cannot be removed.
Removing samples will only remove information for specified samples from existing variants. Variants themselves will not be removed.
Removing annotation databases only remove the database from the project (stop using it), not from the disk.
Removing an annotation database might make other databases unusable if they are linked through one of the fields in the removed database.
To remove a variant table,
% vtools init -f remove
% vtools admin --load_snapshot vt_quickStartGuide_v3
% vtools select variant --samples "sample_name=='CEU'" -t CEU
% vtools select variant --samples "sample_name=='JPT'" -t JPT
% vtools remove tables CEU
% vtools show tables
This command accept the use of wildcard characters ?
and *
so it is possible to easily remove a large number of tables. For example, the following command removes all temporary tables that were created when tables with the same names were created:
% vtools remove tables '*_Dec*'
The parameter should be quoted to avoid early interpretation of wildcard characters from the command line.
Wildcard characters should be used with caution.
To remove a field,
% vtools remove fields CEU_ctrls_freq CEU_ctrls_het
% vtools show fields
% vtools remove geno_fields DP_geno
% vtools show genotypes
The following command will remove an existing project.
% vtools remove project
Show existing samples
% vtools show samples
filename sample_name aff sex BMI
SAMP1.vcf SAMP1 1 M 22.78
SAMP2.vcf SAMP1 2 F 24.43
var_format.vcf SRR028913.aln.sorted.bam None None None
Remove one sample with an affection status of 1
% vtools remove samples 'aff = 1' -v2
INFO: Removing sample SAMP1 from file SAMP1.vcf
Show samples again
% show samples
INFO: Opening project sample.proj
filename sample_name aff sex BMI
SAMP2.vcf SAMP1 2 F 24.43
var_format.vcf SRR028913.aln.sorted.bam None None None
A project uses three databases, dbNSFP
, keggPathway
, and dbSNP131
,
% vtools show
Project name: RA
Primary reference genome: hg18
Secondary reference genome: hg19
Database engine: sqlite3
Variant tables: variant, NS, NS_damaging, NS_sp_damaging, NS_pp, NS1_Aug16_012302, NS1, NS2
Annotation databases: dbNSFP (1.1_0), keggPathway, dbSNP131 (0)
You can remove dbNSFP
from the project using command
% vtools remove annotations dbNSFP
INFO: Removing annotation database dbNSFP from the project
dbNSFP
is no longer available, so keggPathway
cannot be be loaded either, because it is linked by dbNSFP.genename
.
% vtools show
WARNING: Failed to locate field genename
WARNING: Cannot open annotation database keggPathway
Project name: RA
Primary reference genome: hg18
Secondary reference genome: hg19
Database engine: sqlite3
Variant tables: variant, NS, NS_damaging, NS_sp_damaging, NS_pp, NS1_Aug16_012302, NS1, NS2
Annotation databases: dbSNP131 (0)
For example, we can remove all variants having low quality by:
% vtools select variant "DP<10" -t lowDP
% vtools remove variants lowDP
If you would like to remove all but variants in a specified table, you will have to create a table with all variants to be removed using command vtools compare
.
% vtools compare variant to_be_kept --difference to_be_removed
before you remove the table to_be_removed
% vtools remove variants to_be_removed
Removing a large number of variants will be slow. In this case, it is usually much more efficient to create a subproject using the variants to be kept. (vtools init name --parent /path/to/parent --variants to_be_kept
)
For example, we can remove all variants having low quality by:
% vtools remove genotypes "DP_geno<10"
% vtools remove phenotypes BMI