
Adding coordinates from an alternative reference genome

1. Usage

% vtools liftover -h

usage: vtools liftover [-h] [--flip] [-v STD[LOG]] build

Convert coordinates of existing variants to alternative coordinates in an
alternative reference genome. The UCSC liftover tool will be automatically
downloaded if it is not available.

positional arguments:
  build                 Name of the alternative reference genome

optional arguments:
  -h, --help            show this help message and exit
  --flip                Flip primary and alternative reference genomes so that
                        the specified build will become the primary reference
                        genome of the project.
  -v STD[LOG], --verbosity STD[LOG]
                        Output error and warning (0), info (1) and debug (2)
                        information to standard output (default to 1), and to
                        a logfile (default to 2).

2. Details

Vtools provides a command which is based on the tool of USCS liftOver to map the variants from existing reference genome to an alternative build. After executing of this command, The fields of chromosome, position reference and alternative of the variant in current and previous reference genomes are all in the master variant table.

An illustration of the liftover process

  • This command adds alt_chr and alt_pos columns to the master variants table.
  • Annotation databases that use the alternative reference genome can now be used.
  • vtools output and vtools export can output alternative coordinates using parameter --build.

This feature is unavailable under windows because UCSC liftOver tool does not support windows.

Because the UCSC liftover tools does not guarantee complete translation, variants that failed to map will have missing alternative coordinates.

% vtools init -f liftover
% vtools admin --load_snapshot vt_testData_v3
% vtools import V1-3_hg19_combine.vcf --build hg19
% vtools liftover hg38

INFO: Downloading liftOver chain file from UCSC
INFO: Exporting variants in BED format
Exporting variants: 100% [===============================] 288 110.5K/s in 00:00:00
INFO: Running UCSC liftOver tool
Updating table variant: 100% [============================] 288 780.0/s in 00:00:00

After the liftOver operation, three more fields are added to the master variant table (alt_bin, alt_chr, alt_pos)

% vtools show table variant

Name:                   variant
Description:            Master variant table
Creation date:          May29
Fields:                 variant_id, bin, chr, pos, ref, alt, alt_bin, alt_chr, alt_pos
Number of variants:     1611

%  vtools output variant variant_id  bin chr pos ref alt alt_bin alt_chr alt_pos -l 15

variant_id, bin, chr, pos, ref, alt, alt_bin, alt_chr, alt_pos
1   585 1   14677   G   A   585 1   14677
2   585 1   15820   G   T   585 1   15820
... ...
52  586 1   230047  A   T   586 1   260296
53  586 1   230058  T   G   586 1   260307
54  586 1   231480  G   C   586 1   261729
55  586 1   231504  G   A   586 1   261753
56  586 1   231526  C   T   586 1   261775
57  586 1   232223  C   T   587 1   262472
58  586 1   234301  T   C   587 1   264550
59  586 1   234308  A   G   587 1   264557
... ...
% vtools show

Project name:                test
Primary reference genome:    hg19
Secondary reference genome:  hg38
Storage method:              hdf5
Variant tables:              variant
Annotation databases:

% vtools liftover hg38 --flip

INFO: Downloading liftOver chain file from UCSC
INFO: Exporting variants in BED format
Exporting variants: 100% [===============================] 288 116.2K/s in 00:00:00
INFO: Running UCSC liftOver tool
INFO: Flipping primary and alternative reference genome
Updating table variant: 100% [============================] 288 612.1/s in 00:00:00

Interruption of the flipping process will leave the project unusable because of mixed coordinates.

% vtools show

Project name:                test
Primary reference genome:    hg38
Secondary reference genome:  hg19
Storage method:              hdf5
Variant tables:              variant
Annotation databases:

% vtools output variant variant_id  bin chr pos ref alt alt_bin alt_chr alt_pos -l 15

variant_id, bin, chr, pos, ref, alt, DP, alt_bin, alt_chr, alt_pos
1   585 1   14677   G   A   585 1   14677
2   585 1   15820   G   T   585 1   15820
... ...
52  586 1   260296  A   T   586 1   230047
53  586 1   260307  T   G   586 1   230058
54  586 1   261729  G   C   586 1   231480
55  586 1   261753  G   A   586 1   231504
56  586 1   261775  C   T   586 1   231526
57  587 1   262472  C   T   586 1   232223
58  587 1   264550  T   C   586 1   234301
59  587 1   264557  A   G   586 1   234308
... ...