Skip to content. | Skip to navigation

Personal tools
Log in
Sections
You are here: Home Resources Molecular Biology How to Search Zebra Finch Genome

How to Search Zebra Finch Genome

by Iris Adam

How to find my gene of interest on the zebra finch genome:

 

There are many different databases out there to search for gene and sequence information. They are at least partially redundant and differ in their pros and cons especially in terms of user friendliness (see zebra finch genome and analysis tools).

 

Find the proper name of your gene:

The first step in finding your gene is to identify the correct name. Often genes and their product have many alternative names depending on the species, community or time when it was discovered (e.g. Zenk=EGR1).

To find the current official name of a gene I find two websites particularly useful:

http://www.ncbi.nlm.nih.gov/gene  This database cumulates knowledge about genes on the genomic level and has a very good collection of all names used for a certain gene (e.g. if search for Zenk you find the zebra finch Egr1).

http://www.genecards.org/Another Database that gathers all sorts of information about genes and links to the original sources. Very good collection of alternative gene names (e.g. if you search for Zenk you don’t find Egr1 as a hit).

Both databases list your gene of interest under the current official name (e.g. Egr1 instead of Zenk).

Finding the sequence of your gene:

The easiest way to find out whether a certain gene exists on the zebra finch genome is to go to the Ensembl website (http://www.ensembl.org/index.html) select zebra finch as a species and type in the name of your gene.  You are directed to a result page where you can select among all possible hits. Selecting the entry that seems to be your gene leads you to the gene page where you can see how many transcripts are annotated for your gene, the genomic coordinates and much more useful information.

To download the sequence, press “Export data” on left side of the site. A popup window will appear where you can specify which sequence information you would like to extract. Finally the result is displayed in the FASTA format (http://en.wikipedia.org/wiki/FASTA_format).

As the Ensembl annotation is mostly based on the protein coding sequence of the gene, the non-coding sequences are missing for most of the genes. Also the intron exon boundaries might be incorrect. To check if there is more sequence information on the non-coding sequences of your gene, or if the sequence contains errors it is useful to check the Ensembl sequence against the databases of expressed sequences. One possibility to do this is using the NCBI Blast tool (http://blast.ncbi.nlm.nih.gov/).

Select “nucleotide blast” and paste in the sequence you just downloaded as a query. Choose “Others (nr etc.) as database. You can restrict the search to zebra finch sequences by selecting it as an organism. After sending the query you will be redirected to a result page listing sequences closely matching your query. By scrolling down you will find a section showing the alignments of your sequence and the matching database sequences. Among the hits you might find sequences that are longer or differ slightly from your query sequence.  By translating the nucleotide sequence into an amino acid sequence you might be able to decide which sequence is the “proper” one by comparing to e.g the chick protein sequence (e.g. by using the Expasy translate tool http://web.expasy.org/translate/).  

 

What if my gene is not annotated in the Ensembl database?

If your gene of interest is not annotated this can in principle have two reasons:

a.      It does not exist in zebra finches.

b.      It is not annotated (there are many reasons for this case).

To find evidence which of the two possibilities is the case, I usually follow the below listed approaches one by one until I find the gene:

1.      Use the search function of one of the nucleotide databases (http://www.ncbi.nlm.nih.gov/nuccore/, http://titan.biotec.uiuc.edu/cgi-bin/ESTWebsite/estima_start?seqSet=songbird, http://songbirdtranscriptome.net/).

2.      Does the gene exist in chicken?

a.      If yes: download the sequence and use the nucleotide blast to find matching sequences in the zebra finch databases.

b.      If not: try the same strategy with the murine or human sequence of your gene (or any other species where you can find the sequence).

3.      Check the phylogenetic tree of your gene on the Treefam website (http://www.treefam.org/) to get an idea when it emerged first during evolution. This usually gives you a good idea how the odds are that the gene really exists in the avian clade.

4.      The gene should exist in zebra finches but all blasting does not give you a hit:

a.      Get the sequence of the gene of another species (chicken, mouse,…) exon by exon.

b.      Blast each exon singly against the trace archive (http://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastn&BLAST_SPEC=TraceArchive&BLAST_PROGRAMS=megaBlast&PAGE_TYPE=BlastSearch). The trace archive contains all the sequence pieces of the genome sequencing.

c.       Paste together the coding sequence of your zebra finch gene exon by exon.

The first or second approach usually does the trick. The third is basically an additional check how likely it is that the gene exists in birds.

I rarely use the fourth approach since the genome was published on Ensembl, but sometimes it is useful to extract the putative mRNA sequence of your gene using this method.

 

 

 

This is meant as a small guide on how to find your favorite zebra finch gene. It is surely not the only strategy to search for genes, it rather reflects my way of doing it.

Comments (2)

admin Nov 10, 2011 05:49 PM
Claudio: I have some comments on the issue of finding your gene of interest in the finch genome or genomes of other bird species (or of any other species for that matter). In our estimate, circa 10-15% of Ensembl models are incorrectly annotated, for various reasons. In addition, another 8-10% of the genes in the finch and chicken genomes are likely not yet identified through an Ensembl model, either due to failure of the algorithms used to predict a model in the correct locus (sometimes possibly due to a slight sequence error leading to frame shifts, etc) or because there is a significant gap in the currently available genomic sequence. In our hands, the most efficient/sensitive method for detecting a gene in a given genome is to use the blat alignment method available in the UCSC genomic browser. Using that approach we consistently identify loci, including coding and non-coding sequences (the latter often not included in Ensembl), whether or not there is a model there, and in addition one can do a synteny analysis to confirm orthology. I think it is worthwhile emphasizing this point as in some cases one might miss a gene present by only looking at Ensembl. I’d also like to propose that we should as a group work towards generating our own manually curated set of models representing the complete set of UniGenes in the zebra finch. There are many ways of going about that, all of them too laborious for a single lab to tackle, but possibly doable as a group effort. I believe that many of us already have their own internal curated sets based on trying to identify genes of interest, but I’d suggest that we would all benefit by finding a way to share that information across all members of our community
David Clayton Nov 11, 2011 11:37 AM
I agree that we need to find ways to keep moving forward with improving
the quality and completeness of the zebra finch genome assembly and its
annotation. There are two ways we can go here (not necessarily mutually
exclusive): 1) we can try to fix the problem at the source, and generate an
improved genome assembly and de novo (Ensembl) annotation; we're
dealing with the first generation zf assembly, whereas chicken is up to the
fourth generation (much better quality and more complete); 2) we can try to
generate a community informatics resource for storing and collating the
work we've done independently; at a minimum, somewhere we could host
a community Gbrowser where all of our annotations could be stored and
accessed. I've tried to accomplish both as add-on components in two
research grant applications but the applications have not been successful.
Perhaps it's time to join forces and develop a community-driven resource
grant application?