Sequence Region Tables

Information Information Software Ensembl Core Database Schema

Sequence Region Tables

The seq_region table provides information about every piece of DNA inside Ensembl that is intended for feature storing. Not every seq_region has to be associated with sequence. The name part should only contain the following characters [_a-zA-Z.0-9].

The dna table provides sequence data for a sequence region. In order to be retrievable by the API, the seq_region has to be connected to the sequence_level coordinate system. The length of the sequence has to be the length of the seq_region.

The dna_c table can contain 2-bit compressed dna sequence. This is done by the API and should not be tried manually.

The coord_system table describes a coordinate system. Typical examples would be chromosome, contig, chunk, clone etc. The version is meant to distinguish coordinate systems that have sequence_regions of the same name with different content, like chromosome names in different assemblies. The rank indicates the level of context in this coordinate system. Rank 1 is the broadest e.g. chromosome, lower ranks have less context (meaning smaller seq_regions).

The assembly table states which parts of seq_regions are exactly equal. It enables transformation of coordinates between seq_regions. Typically this contains how chromosomes are made of contigs, clones out of contigs, and chromosomes out of supercontigs. It allows you to artificially chunk chromosome sequence into smaller parts.

Every seq_region can have arbitrary attributes associated with it. These are stored in the seq_region_attrib table. The API currently knows about the toplevel attribute. A seq_region with that attribute is considered to represent its sequence in the broadest possible context.

.

Sequence Region Tables

GermOnline