Because a file might contain genotype for multiple samples (.vcf), and genotype for a sample can be spread into several files (your case), a sample in variant tools is uniquely identified by filename
and sample_name
in the output of “vtools show sample
”. However
Samples usually come with pre-specified names (the header line of vcf or other text files). But you may customize sample names by option --sample_name
in vtools import
for different data sources when you import data. The customized names will overwrite the original names. Then your sample can be identified by option --samples 'sample_name = "name"'
in vtools select
command.
If there are too many to customize, you could still identify your sample by filename + sample_name
(e.g. --samples 'filename like "FILE_1%"' 'sample_name = "NA07000"'
).
You can also add a column of customized sample names to the sample table using command vtools phenotype
. You can then refer to the samples using the new names. For example you append an additional column sampleID
to the sample table, where you customize sample names from different sources, then use --samples 'sampleID="name"'
to identify samples.
With SQLite and MySQL, you can “analyze” tables or create indexes on table columns to help speed up queries. Creating indexes though does increase the size of the database and can slow down import speeds. Indexes based on genomic positions are automatically created by vtools.
vtools execute 'analyze variant'
vtools execute 'create index my_index on variant(some_field)'
For more help see: