% vtools exclude -h
usage: vtools exclude [-h] [-s [COND [COND ...]]] [-t [TABLE [DESC ...]]]
[-c | -o [FIELDS [FIELDS ...]]]
[--header [HEADER [HEADER ...]]] [-d DELIMITER]
[--na NA] [-l N] [--build BUILD]
[-g [FIELD [FIELD ...]]]
[--order_by [FIELD [FIELD ...]]] [-u] [-v {0,1,2}]
from_table [condition [condition ...]]
Exclude variants according to properties (variant and annotation fields) and
membership (samples) of variant. The result can be counted, outputted, or
saved to a variant table.
positional arguments:
from_table Source variant table.
condition Conditions by which variants are excluded. Multiple
arguments are automatically joined by 'AND' so 'OR'
conditions should be provided by a single argument
with conditions joined by 'OR'. If unspecified, all
variants (except those excluded by parameter
--samples) will be excluded.
optional arguments:
-h, --help show this help message and exit
-s [COND [COND ...]], --samples [COND [COND ...]]
Limiting variants from samples that match conditions
that use columns shown in command 'vtools show sample'
(e.g. 'aff=1', 'filename like "MG%"').
-t [TABLE [DESC ...]], --to_table [TABLE [DESC ...]]
Destination variant table.
-c, --count Output number of variant, which is a shortcut to '--
output count(1)'.
-o [FIELDS [FIELDS ...]], --output [FIELDS [FIELDS ...]]
A list of fields that will be outputted. SQL-
compatible expressions or functions such as "pos-1",
"count(1)" or "sum(num)" are also allowed.
-v {0,1,2}, --verbosity {0,1,2}
Output error and warning (0), info (1) and debug (2)
information to standard output (default to 1).
Output options:
--header [HEADER [HEADER ...]]
A complete header or a list of names that will be
joined by a delimiter (parameter --delimiter). If a
special name - is specified, the header will be read
from the standard input, which is the preferred way to
specify large multi-line headers (e.g. cat myheader |
vtools export --header -). If this parameter is given
without parameter, a default header will be derived
from field names.
-d DELIMITER, --delimiter DELIMITER
Delimiter, default to tab, a popular alternative is
',' for csv output
--na NA Output string for missing value
-l N, --limit N Limit output to the first N records.
--build BUILD Output reference genome. If set to alternative build,
chr and pos in the fields will be replaced by alt_chr
and alt_pos
-g [FIELD [FIELD ...]], --group_by [FIELD [FIELD ...]]
Group output by fields. This option is useful for
aggregation output where summary statistics are
grouped by one or more fields.
--order_by [FIELD [FIELD ...]]
Order output by specified fields in ascending order.
-u, --unique Remove duplicated records while keeping the order of
output. This option can be time- and RAM-consuming
because it keeps all outputted records in RAM to
identify duplicated records. You should pipe output to
command 'uniq' if you only need to remove adjacent
duplicated lines.
This command differs from vtools select
only in that it excludes (rather than selects) from a variant table a subset of variants satisfying given condition(s), count, output or save the remaining variants.
However, command vtools exclude
is not simply a vtools select
command with a reversed condtion. As we will show in the examples, command “`vtools select table cond
” (e.g. sift_score > 0.95
) and “vtools exclude table reverse-cond
” (e.g. sift_score <= 0.95
) might select different sets of variants. This happens when
If field values for some variants are missing (NULL
), they will not be selected by commands such as vtools select table "sift_score > 0.95"
and vtools select table "sift_score <= 0.95"
. Variants without a score and with score > 0.95 can be selected by command vtools exclude table "sift_score <= 0.95"
. Alternatively, you can use command vtools select table "sift_score > 0.95 OR sift_score is NULL"
to explicitly specify the NULL case.
If there are multiple entries for a variant in the annotation database, these variants might match both conditions. This is, for example, the case for some variants in dbNSFP when these variants are included in different genes and have different damaging scores. These variants will only be selected by vtools select
.
For example,
% vtools init -f test
% vtools select ns 'sift_score > 0.95' -t ns_damaging
Running: 0 0.0/s in 00:00:00
INFO: 10 variants selected.
selects 10 variants. If we remove non-synonymous variants with sift_score <= 0.95, we will get 9 variants.
% vtools exclude ns 'sift_score <= 0.95' -t ns_excl_benign
Running: 0 0.0/s in 00:00:00
INFO: 9 variants selected.
We track this difference using vtools compare
% vtools compare ns_damaging ns_excl_benign --difference diff -v0
and output the information for this variant
% vtools output diff variant_id chr pos ref alt sift_score genename --build hg18
1036 5 139908704 C A 1 ANKHD1-EIF4EBP3
1036 5 139908704 C A 0.942108 EIF4EBP3
if we use the complete dbNSFP
annotation database we can show more fields
% vtools output diff variant_id chr pos ref alt CCDSid sift_score genename Descriptive_gene_name --build hg18
#id chr pos ref alt CCDSid sift_score genename Descriptive_gene_name
1036 5 139908704 C A CCDS4224.1 1.0 ANKHD1-EIF4EBP3 ANKHD1-EIF4EBP3 readthrough
1036 5 139908704 C A CCDS4226.1 0.942108 EIF4EBP3 eukaryotic translation initiation factor 4E binding protein 3
It turns out that this variant has two entries in dbNSFP for different genes. In this case the variant matches both conditions “sift_score>0.95” and “sift_score<=0.95”. As a result this variant will be selected by vtools select "sift_score>0.95"
but not vtools exclude "sift_score<=0.95"