.

Compara Mart

For a full description of the whole genome alignment methods see the corresponding page :
Blastz-net / Translated Blat Pairwise Alignments, Pecan Multiple Alignments and GERP

Format used in ComparaMart Whole Genome Alignment (Pairwise and Multi Species)

AXT

Complete information about AXT format can be found here

As described in the link above each alignment block in an axt file contains three lines: a summary line and 2 sequence lines. Blocks are separated from one another by blank lines. The summary line contains coordinate and size information about the alignment. It consists of 9 required fields.

Alignment number : The alignment numbering starts with 0 and increments by 1.
Chromosome (primary organism)
Alignment start (primary organism) : The first base is numbered 1.
Alignment end (primary organism) : The end base is included.
Chromosome (aligning organism)
Alignment start (aligning organism)
Alignment end (aligning organism)
Strand (aligning organism) : If the strand value is "-", the values of the aligning organism's start and end fields are relative to the reverse-complemented coordinates of its chromosome.
Score

example: BlastZ-net alignments between human chromosome 1 and the mouse genome

0     1       2040       2122    17   66048810   66048885 + 7968
GATTGGAGGAAAGATGAGTGAGAGCATCAACTTCTCTCACAACCTAGGCCAGTAAGTAGTGCTTGTGCTCATCTCCTTGGCTG
GGTTGGAGGGAAGATGAGTGAAGGGATCAATTTCTCTGATGACCTGGGCCGGTAGG-------TGTGGTGTCCTCTTTGTCTG

1     1     123187     123246     7   25002102   25002162 + 148569
CTGCCCCTGCCCTGACTCCCAGCCCTG-TGGGGGTCCTGACCGCACCTCACCTGGCTCAGA
CTACTCCTGTCCCCACTCCCAGCCCTGCTGGGGGCCCTGACCCCACCTCTCCAGGCTCGGA

AXTPLUS

This format is our own extension of the axt format, with an extended header and the freedom to have the query sequence (primary organism) in - strand (axt assumes always the query sequence to be in + strand)

The header is now 12 spaced-separated columns (only 9 in the former axt format) :

Alignment number
Chromosome (primary organism)
Alignment start (primary organism) : The first base is numbered 1.
Alignment end (primary organism) : The end base is included.
Strand (primary organism)
Chromosome (aligning organism)
Alignment start (aligning organism)
Alignment end (aligning organism)
Strand (aligning organism)
Score
Chromosome length (primary organism)
Chromosome length (aligning organism)

example: BlastZ-net alignments between human chromosome 1 and the mouse genome

0     1      90200      90289 +     1  178533830  178533921 + 126712  247249719  197069962
ATAGCCCATTAGGCCTCAATGAAGTCTTATGCAAGACCAGAAGCCAATTTGCCATTT--AAGGTGATTCTCCATGTTTCTGCTCTAACTGTG
AAGGTCTATTAACTGTTGAATAAGTCTTACACAAACACAGAAGCCAATCCTCCTTTTTGTAGGTGATTCTCCATGCTGCTGTTCTCACTTGG

1     1     122259     122362 +     7   24996509   24996614 + 148569  247249719  145134094
GCTGATGATGGCTTTAGCACCACCGACACCGATCTCAAGTTCAAGGAGTGGGTGACCGAC--TGAGAGTGGGGACAACTCTGGGGAGGAGCCAGAGGGCAACAAGG
GCTGATGACAGCTTTGGCACCACCGACATTGATCTCAAGTGCAAGGAACGAGTGACTGACAGTGAAAGTGGAGACAGCTCTGGGGAGGACCCAGAGGGTAACAAGG

MAF

Complete information about MAF format can be found here

As described in the above the MAF format is line-oriented. Each multiple alignment ends with a blank line. Each sequence in an alignment is on a single line, which can get quite long, but there is no length limit. Words in a line are delimited by any white space. Lines starting with # are considered to be comments. Lines starting with ## can be ignored by most programs, but contain meta-data of one form or another.

Each alignment begins with an 'a' line that set variables for the entire alignment block. The 'a' is followed by name=value pairs which correspond to the score of the alignment.
The 's' lines together with the 'a' lines define a multiple alignment. The 's' lines have the following fields which are defined by position rather than name=value pairs.

src : The name of one of the source sequences for the alignment. For sequences that are resident in a browser assembly, the form 'database.chromosome' allows automatic creation of links to other assemblies. Non-browser sequences are typically reference by the species name alone.

start : The start of the aligning region in the source sequence. This is a zero-based number. If the strand field is '-' then this is the start relative to the reverse-complemented source sequence.

size : The size of the aligning region in the source sequence. This number is equal to the number of non-dash characters in the alignment text field below.

strand -- Either '+' or '-'. If '-', then the alignment is to the reverse-complemented source.

srcSize : The size of the entire source sequence, not just the parts involved in the alignment.

text : The nucleotides (or amino acids) in the alignment and any insertions (dashes) as well.

example: GERP (Genomic Evolutionary Rate Profiling)

a score=8.617251
s hsap.chr1  240616145      23 -      247249719 AAGGAGTCCTAGAATGGAGCACA 
s ptro.chr1  223263538      23 -      229974691 AAGGAGTCCTAGAATGGAGCACA 
s mmul.chr1  218580358      23 -      228252215 AAGGAGTCCTAGAATGGAGTATA 
s cfam.chr5   28512329      23 -       91976430 AAGGAGTCCTAGGATGGAGTACA 
s btau.chr1   81704165      23 +      102834029 AAGGAGTCCTGGAAAGGAGCACA 
s mdom.chr4   43504342      16 -      430141050 AAGGAA-ACTAGCATGG------ 

MFA

Is a FASTA like format for pairwise alignments. For the pairwise alignments the header is pre-defined with alignment coordinates, score and species name, and each Fasta blocks are separated from one another by a number sign (#).

example: Human - Chicken Pairwise BlastZ-net alignment

>chr:1|start:5646|end:5996|strand:1|score:10605|Homo sapiens
AGGGCCCGCTCACCTTGCTCCTGCTCCTTCTGCTGCTGCTTCTCCAGCTTTCGCTCCTTCATGCTGCGCAGCTTGGCCTT
GCCGATGCCCCCAGCTTGGCGGATGGACTCTAGCAGAGTGGC-CAGCCACCGGAGGGGTCAACCACTTCCCTGGGAGCTC
CCTGGACTGGAGCCGGGAGGTGGGGAACAGGGCAAGGAGGAAAGGCTGCTCAGGCA--GGGCTGGGGAAGCTTACTGTGT
CCAAGAGCCTGCTGGGAGGGAAGTCACCTCCCCTCAAACGAGGAGCCCTGCGCTGGGGAGGCCGGACC-------TTTGG
AGACTGTGTGTGGGGGCCTGGGCACTGACTTCTGCAACCAC
>chr:1|start:62073494|end:62073803|strand:1|score:10605|Gallus gallus
AGAGCCTCAGCACCTTGTTCTTGTTCCTTTTGCTTCTTCTTCTCCAGCTTCCTCTCCTTGACACTGCGGAGGTTGGCCTT
CCCAATGCCCCCTGCCTGGCGAATGGATTCCAGGAGACTGGCACGCCCAGTTGAGGGGTTCACCACCTCCTTTGGGGCTC
CCTGGACCGAAGCTG------------CAGGGACAGGAAGGA--------CATGCATCAGCCTCGGGTA-----------
--------TAGCTGCAAGCCCAGCCAAATCCCCCCACAGAAAGGG--CTGCTCT---GATGCCCTGCCATCTTCTTTTTG
GGACTGTGCG-------CCTGACACAGACACTTTCAGGCAC
#

Format used in ComparaMart Homologues

For a full description of the Gene Orthology/Paralogy prediction method see the corresponding page :
Gene Orthology/Paralogy prediction method

FASTAH & FASTAA

Are FASTA like format for homologues and aligned homologues respectively. The headers can be defined by the user by selecting any attributes available. Fasta blocks are separated from one another by a number sign (#).

example: Human - Mouse Homologues (Peptide)

>Homo sapiens|16|ENSG00000162073|ENST00000318782|ortholog_one2one
MAFLAGPRLLDWASSPPHLQFNKFVLTGYRPASSGSGCLRSLFYLHNELGNIYTHGLALL
GFLVLVPMTMPWGQLGKDGWLGGTHCVACLAPPAGSVLYHLFMCHQGGSAVYARLLALDM
CGVCLVNTLGALPIIHCTLACRPWLRPAALVGYTVLSGVAGWRALTAPSTSARLRAFGWQ
AAARLLVFGARGVGLGSGAPGSLPCYLRMDALALLGGLVNVARLPERWGPGRFDYWGNSH
QIMHLLSVGSILQLHAGVVPDLLWAAHHACPRD
>Mus musculus|17|ENSMUSG00000023909|ENSMUST00000024702|ortholog_one2one
MAFLTGPRLLDWASSPPHLQFNKFVLTGYRPASSGSGCLRSLFYLHNELGNIYTHGLALL
GFLVLVPMTMPWSQLGKDGWLGGTHCVACLVPPAASVLYHLFMCHQGGSPVYTRLLALDM
CGVCLVNTLGALPIIHCTLACRPWLRPAALMGYTALSGVAGWRALTAPSTSARLRAFGWQ
AGARLLVFGARGVGLGSGAPGSLPCYLRMDALALLGGLVNVARLPERWGPGRFDYWGNSH
QIMHLLSVGSILQLHAGVVPDLLWAAHHACPPD
#

example: Human - Mouse Homologues (Aligned Peptide)

>Homo sapiens|12|ENSG00000182196|ENST00000315580|ortholog_one2one
MERAGPAGEEGGAREGRLLPRAPGAWVLRACAERAALEVGAASADTGVRGCGARGPAPLL
ASAGGGRARDGTWGVRTKGSGAALPSRPASRAAPRPEASSPPLPLEKARGGLSGPQGGRA
RGAMAHVGSRKRSRSRSRSR-G-RGSEKRKKKSRKDTSRNCSASTSQGRKASTAPGAEAS
PSPCITERSKQKARRRTRSSSSSSSSSSSSSSSSSSSSSSSSSDGRKKRGKYKDKRRKKK
KK--RKKLKKKGKEKAEA-QQVEALPGPSLDQWHRSAGEEEDGPVLTDEQKSRIQAMKPM
TKEEWDARQSIIRKVVDPETGRTRLIKGDGEVLEEIVTKERHREINKQATRGDCLAFQMR
AGLLP
>Mus musculus|5|ENSMUSG00000029404|ENSMUST00000031351|ortholog_one2one
------------------------------------------------------------
------------------------------------------------------------
---MAHVGSRKRSRSRSRSRSGRRGSEKRSKRSSKDASRNCSASRSQGHKAGSASGVE--
------ERSKHKAQRTSRSSSTSSSSS--SSSSA---SSSSSSDGRKKRAKHKEKKRKKK
KKKRKKKLKKRVKEKAVAVHQAEALPGPSLDQWHRSAGEDNDGPVLTDEQKSRIQAMKPM
TKEEWDARQSVIRKVVDPETGRTRLIKGDGEVLEEIVTKERHREINKQATRGDGLAFQMR
TGLLP
#

 

© 2024 Inserm. Hosted by genouest.org. This product includes software developed by Ensembl.

                
GermOnline based on Ensembl release 50 - Jul 2008
HELP