.

Introduction to Ensembl Functional Genomics

Installation Requirements

Given that you have already followed the instructions for installation of the Ensembl core and functional genomics APIs (http://www.ensembl.org/info/software/api_installation.html), the next step is to set up the eFG specific requirements. This is an exhaustive list and may not be necessary if you do not intend to use the full functionality of eFG. Install the following as required:

Set Up

The eFG system uses a shell environment to set global variables and help perform common tasks. You will need to edit the .efg file accordingly:

efg@bc-9-1-02>more ensembl-functgenomics/scripts/.efg
#!/usr/local/bin/bash

echo "Setting up the Ensembl Function Genomics environment..."

### ENV VARS ###

#Prompt
export PS1='efg@$PS1HOST>'

#Code/Data Directories
export SRC=~/src                                #Root source code directory. EDIT
export EFG_SRC=$SRC/ensembl-functgenomics       #eFG source directory
export EFG_SQL=$EFG_SRC/sql                     #eFG SQL
export EFG_DATA=/your/data/dir/efg              #Data directory. EDIT
export PATH=$PATH:$EFG_SRC/scripts              #eFG scripts directory
export PERL5LIB=$EFG_SRC/modules:$PERL5LIB      #Update PERL5LIB. EDIT add ensembl(core) etc. if required
#Your efg DB connection params
export WRITE_USER='write_user'                  #EDIT
export READ_USER='read_user'                    #EDIT
export HOST='efg-host'                          #EDIT
export PORT=3306                                #EDIT
export MYSQL_ARGS="-h${HOST} -P${PORT}"

#Your ensembl core DB connection params, read only
export CORE_USER='anonymous'                    #EDIT if required
export CORE_HOST='ensembldb.ensembl.org'        #EDIT if required
export CORE_PORT=3306                           #EDIT if required

#Default norm and analysis methods
export NORM_METHOD='VSN_GLOG'                   #EDIT if required e.g. T.Biweight, Loess
export PEAK_METHOD='Nessie'                     #EDIT if required e.g. TileMap, MPeak, Chipotle

#R config
export R_LIBS=${R_LIBS:=$SRC/R-modules}         #EDIT if required
export R_PATH=/software/bin/R                   #Location of local version of R. EDIT
export R_FARM_PATH=/software/R-2.4.0/bin/R      #Location of farm installed R. EDIT
export R_BSUB_OPTIONS="-R'select[type==LINUX64 && mem>6000] rusage[mem=6000]' -q bigmem" #EDIT

As is indicated at the head of the .efg file, to enable easy access to the eFG environment it is useful to add the following to your .*rc login file:

alias efg='. ~/src/ensembl-efg/.efg'

Once this is done simple type 'efg' to enter the envornment, which will give you access to some helper functions such as, CreateDB:

efg@bc-9-1-02>CreateDB my_homo_sapiens_funcgen_47_36i password
Creating DB my_homo_sapiens_funcgen_47_36i

It is desirable to maintain the standard Ensembl nomenclature for a database and simply prefix it with some descriptive tag. Failure to do so may cause problems in dynamically detecting the correct core DB to use. The CreateDB function also supports overwriting of a particular instance of an eFG DB by specifying a third 'drop' argument:

efg@bc-9-1-02>CreateDB my_homo_sapiens_funcgen_47_36i password drop
Dropping DB my_homo_sapiens_funcgen_47_36i
Creating DB my_homo_sapiens_funcgen_47_36i

Once you have set up the environment, you are now ready to import data or query the central ensembl or a local copy of an eFG DB.

Note: It is not necessary to set up the environment if you simply want to query a remote eFG DB i.e. The eFG Dbs available at ensembldb.ensembl.org. However, you may find that some of the tools scripts will require explicit definition of some of the above environment variables via the command line.

Tools scripts

There are various types of data import, export and transformation which can be performed using the scripts available in the scripts directory. These encompass simple cell and feature type imports, through to array design and full experiment imports. Most of the more common tasks have template shell scripts with required parameters set and others left for editing. Here follows a list of the main types of tool script:

Importing an experiment

Prior to running your first experiment import, you will likely need to import the necessary features types first.

efg@bc-9-1-02>more run_import_type.sh 
#!/bin/sh

PASS=$1
shift

$EFG_SRC/scripts/import_type.pl\
        -type FeatureType\
        -name H3K4me3\
        -dbname your_homo_sapiens_funcgen_48_36j\
        -description 'Histone 3 Lysine 4 Tri-methyl'\
        -class HISTONE\
        -pass $PASS

Feature type names should correspond to a recognised ontology or nomenclature where appropriate e.g. Brno nomenclature for histones. The class parameter is not required for CellType imports.

To import an experiment you must first create an input directory for the array vendor and your experiment e.g.

mkdir $EFG_DATA/input/NIMBLEGEN/EXPERIMENT_NAME

The eFG system currently expects only one experiment per input directory. If your DVD contains more than one experiment, you will need to split the files up, recreating any meta files accordingly e.g. DesignNotes.txt, SampleKey.txt. A Nimblegen experiment import can be done using the appropriate run script:


efg@bc-9-1-02>more run_NIMBLEGEN.sh
#!/bin/sh

PASS=$1
shift

$EFG_SRC/scripts/parse_and_import.pl\
       -name 'DVD_OR_EXPERIMENT_NAME'\              #Name of the data directory
       -format tiled\                               #Array format
       -vendor NIMBLEGEN\
       -location Hinxton\                           #Your group location
       -contact 'your@email.com'\
       -species homo_sapiens\
       -fasta\                                      #Flag to dump the array as a fasta file, useful for remapping
       -port 3306\
       -host dbhost\
       -dbname 'your_homo_sapiens_funcgen_47_36i'\
       -array_set\                                  #Flag to treat every chip/slide as part of one array
       -array_name "DESIGN_NAME"\
       -cell_type e.g. U2OS\
       -feature_type e.g. H3K4me3\
       -group efg\                                  #Your groupname
       -data_version 41_36c\                        #The Ensembl data version corresponding to your data
       -verbose\
       -tee\
       -pass $PASS\
       -recover                                     #Enables recovery mode for failed/partial imports

Running the above script will perform a preliminary import. This involves validation checks and import of some basic meta data. The meta data gleaned from the import parameters and available files is automatically populated within a tab2mage file located in the output directory, at which point the import stops to allow manual curation of the tab2mage file. Due to the lack of comprehensive meta data associated with an experiment DVD, it is necessary to inspect, correct and annotate the tab2mage file where possible. Failure to do so may result in permanent loss of meta data, an inability to submit to ArrayExpress and a corrupted import which ultimately may prevent any further analysis.

There are three main areas to be addressed, most of which may have been automatically populated. Fields which need attention are marked with three question marks e.g. ???

© 2024 Inserm. Hosted by genouest.org. This product includes software developed by Ensembl.

                
GermOnline based on Ensembl release 50 - Jul 2008
HELP