How to Search Zebra Finch Genome
How to find my gene of interest on the zebra finch genome:
Find the proper name of your gene:
The first step in finding your gene is to identify the correct name. Often genes and their product have many alternative names depending on the species, community or time when it was discovered (e.g. Zenk=EGR1).
To find the current official name of a gene I find two websites particularly useful:
http://www.ncbi.nlm.nih.gov/gene This database cumulates knowledge about genes on the genomic level and has a very good collection of all names used for a certain gene (e.g. if search for Zenk you find the zebra finch Egr1).
http://www.genecards.org/Another Database that gathers all sorts of information about genes and links to the original sources. Very good collection of alternative gene names (e.g. if you search for Zenk you don’t find Egr1 as a hit).
Both databases list your gene of interest under the current official name (e.g. Egr1 instead of Zenk).
Finding the sequence of your gene:
The easiest way to find out whether a certain gene exists on the zebra finch genome is to go to the Ensembl website (http://www.ensembl.org/index.html) select zebra finch as a species and type in the name of your gene. You are directed to a result page where you can select among all possible hits. Selecting the entry that seems to be your gene leads you to the gene page where you can see how many transcripts are annotated for your gene, the genomic coordinates and much more useful information.
To download the sequence, press “Export data” on left side of the site. A popup window will appear where you can specify which sequence information you would like to extract. Finally the result is displayed in the FASTA format (http://en.wikipedia.org/wiki/FASTA_format).
As the Ensembl annotation is mostly based on the protein coding sequence of the gene, the non-coding sequences are missing for most of the genes. Also the intron exon boundaries might be incorrect. To check if there is more sequence information on the non-coding sequences of your gene, or if the sequence contains errors it is useful to check the Ensembl sequence against the databases of expressed sequences. One possibility to do this is using the NCBI Blast tool (http://blast.ncbi.nlm.nih.gov/).
Select “nucleotide blast” and paste in the sequence you just downloaded as a query. Choose “Others (nr etc.) as database. You can restrict the search to zebra finch sequences by selecting it as an organism. After sending the query you will be redirected to a result page listing sequences closely matching your query. By scrolling down you will find a section showing the alignments of your sequence and the matching database sequences. Among the hits you might find sequences that are longer or differ slightly from your query sequence. By translating the nucleotide sequence into an amino acid sequence you might be able to decide which sequence is the “proper” one by comparing to e.g the chick protein sequence (e.g. by using the Expasy translate tool http://web.expasy.org/translate/).
What if my gene is not annotated in the Ensembl database?
If your gene of interest is not annotated this can in principle have two reasons:
a. It does not exist in zebra finches.
b. It is not annotated (there are many reasons for this case).
To find evidence which of the two possibilities is the case, I usually follow the below listed approaches one by one until I find the gene:
1. Use the search function of one of the nucleotide databases (http://www.ncbi.nlm.nih.gov/nuccore/, http://titan.biotec.uiuc.edu/cgi-bin/ESTWebsite/estima_start?seqSet=songbird, http://songbirdtranscriptome.net/).
2. Does the gene exist in chicken?
a. If yes: download the sequence and use the nucleotide blast to find matching sequences in the zebra finch databases.
b. If not: try the same strategy with the murine or human sequence of your gene (or any other species where you can find the sequence).
3. Check the phylogenetic tree of your gene on the Treefam website (http://www.treefam.org/) to get an idea when it emerged first during evolution. This usually gives you a good idea how the odds are that the gene really exists in the avian clade.
4. The gene should exist in zebra finches but all blasting does not give you a hit:
a. Get the sequence of the gene of another species (chicken, mouse,…) exon by exon.
b. Blast each exon singly against the trace archive (http://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastn&BLAST_SPEC=TraceArchive&BLAST_PROGRAMS=megaBlast&PAGE_TYPE=BlastSearch). The trace archive contains all the sequence pieces of the genome sequencing.
c. Paste together the coding sequence of your zebra finch gene exon by exon.
The first or second approach usually does the trick. The third is basically an additional check how likely it is that the gene exists in birds.
I rarely use the fourth approach since the genome was published on Ensembl, but sometimes it is useful to extract the putative mRNA sequence of your gene using this method.
This is meant as a small guide on how to find your favorite zebra finch gene. It is surely not the only strategy to search for genes, it rather reflects my way of doing it.