Help : Gene Alignment Help
Contents
What are the alignment data?
Map positions are assigned to identifiers using the NCBI genome assembly, hg17 NCBI
Build 35, accessed through the UCSC
genome browser, GoldenPath (May 2004 freeze). Each position is
associated with a GenBank accession number. An accession may
have 1 to many genomic positions within GoldenPath; however, there are
generally 2 mappings, one on the positive strand and one on the
negative strand.
The data that are returned to the user for unconsolidated positions
include the following:
- Identifier: The input value provided by the user to query
the genome
- Accession: The accession number to which the identifier has
been mapped. If there are multiple accessions for an
identifier, one row for each accession will be returned.
- Target Start: Alignment start position in target chromosome
- Target End: Alignment end position in target chromosome
- Strand: + or - for chromosome strand
- Chromosome: Target sequence name
- Q Start*: Alignment start position in chromosome
- Q Size*: Query sequence size
- In the Start/End fields the coordinates are where it matches
from the point of view of the forward strand. For more information,
see the UCSC site.
How are clones aligned?
A clone is aligned by first being mapped to its associated GenBank
accessions via DBest. These accessions are
then mapped to the genome via the UCSC data. Each clone can map to 1
to many GenBank accessions.
How are genes aligned?
A gene is first mapped to a Unigene Cluster ID and then the accessions
that map to that cluster are returned. There are generally many
accessions mapped to one cluster.
How are the data consolidated?
There are several steps taken when data are returned in a consolidated
position.
- The identifier is mapped to Genbank accessions.
- If the group of accessions that is associated with the identifier
map to more than 1 chromosome, the data are thrown out. You will need
to use the unconsolidated mapping position in order to see all
positions for this identifier.
- Next, the largest end position for all the accessions aligned
to the identifier is compared with the smallest start position. If
this distance is greater than the maximum distance between queries
that you have selected (by default this is 1,000,000 bases) then the
data are discarded.
- If the data are not discarded, the size that is returned is the distance
between the smallest accession start position and the largest accession end position.
- Please note that for most genes, since there are so many
accessions that map to a cluster, this option is not recommended.
It will generally not return any results due to the fact that one of
the accessions might be mapped to a different (potentially erroneous) chromosome.
Please send comments or questions to:
array@genome.stanford.edu