Skip to content

Ensembl-VEP

VEP determines the effect of your variants (insertions, deletions and structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.

How and where to install a new version?

There is a shared folder in datasets where there are several vep cache versions and docker containers. If you are planning to download one, make sure to store it in this location.

/workspace/datasets/vep

The easiest way to use a new version of vep is downloading the docker container. Most popular versions are already downloaded in /workspace/datasets/vep/homo_sapiens (or /workspace/datasets/vep/mus_musculus for mice).

An example for downloading a new version:

singularity pull docker://ensemblorg/ensembl-vep:release_109.0

Once the container is ready, we can download the vep-cache required:

singularity exec ensembl-vep_109.sif INSTALL.pl -c 109_GRCh38/ -a cf -s homo_sapiens --ASSEMBLY GRCh38

IMPORTANT. Each database only works with the specific ensembl-vep version used to download the database. In the previous example, the 109_GRCh38/ will work only with the ensembl-vep_109 version.

For more details, you can follow the installation guide.

How to use

Once you are in a working node:

mgrau@bbgn009:/workspace/datasets/vep/homo_sapiens$ singularity exec ensembl-vep_109.sif vep

#----------------------------------#
# ENSEMBL VARIANT EFFECT PREDICTOR #
#----------------------------------#

Versions:
  ensembl              : 109.10baaec
  ensembl-funcgen      : 109.cba2db8
  ensembl-io           : 109.4946a86
  ensembl-variation    : 109.18a12b6
  ensembl-vep          : 109.3

Help: dev@ensembl.org , helpdesk@ensembl.org
Twitter: @ensembl

http://www.ensembl.org/info/docs/tools/vep/script/index.html

Usage:
./vep [--cache|--offline|--database] [arguments]

Basic options
=============

--help                 Display this message and quit

-i | --input_file      Input file
-o | --output_file     Output file
--force_overwrite      Force overwriting of output file
--species [species]    Species to use [default: "human"]

--everything           Shortcut switch to turn on commonly used options. See web
                       documentation for details [default: off]
--fork [num_forks]     Use forking to improve script runtime

For full option documentation see:
http://www.ensembl.org/info/docs/tools/vep/script/vep_options.html

Example job

A real example command could be:

mgrau@bbgn009:/workspace/datasets/vep/homo_sapiens$ singularity exec vep109.sif vep --dir /workspace/datasets/vep/ -i variants_ref38.vcf.gz --offline --format vcf --vcf --cache -o exampleout.vcf --species homo_sapiens --assembly GRCh38 --fork 8

To speed up the process, it is recommended to use the downloaded vep-cache files specifying the directory (--dir) and the --offline and --cache options. VEP allows multithreating using the --fork option.

For full option documentation see here.

Full instructions on how to download and use cached files can be found here.

Additional comments

Be careful when running VEP with the TAB output and then merging again the variants from a VCF file, some indels are reformated in VEP and you cannot pair them with the original mutations.

Info

You can find another example using VEP data and google cloud -- see Extract minibams from Hartwig data in googleCloud section.

Reference

  • Miguel Grau
  • Federica Brando
  • Carlos López-Elorduy