Wednesday, August 08, 2012

Some of the Query strategies for Oomycetes Genomes in FungiDB


Querying Oomycetes genomes in FungiDB 2.0

FungiDB 2.0 released with 6 Oomycetes genomes:

Phytophthora  sojae V5.0
Phytophthora ramorum V1.0
Phytophthora capsici V11.0
Phytophthora  infestans V4.0
Pythium ultimum V 2.0
Hyaloperonospora arabidopsidis V8.3

Naming conventions:

Gene ids:

There are 2 types of standards followed in naming of oomycetes gene ids in FungiDB. In most of the cases, the gene ids are represented by a 5 letter word followed by an underscore, followed by the gene identifier devised by the sequencing center e.g; Physo_517720, Phyra_74442, Hyaar_813319, Phyca_96628, Pytul_G005233. Genomes sequenced and annotated at Broad (P. infestans, P. parasitica), already have gene ids prefixed by a 4 letter code such as PITG and we have left those as is e.g; PITG_05520 .

Scaffolds:

Since most of these genomes are released as draft assemblies, their genome fasta files are named as follows:  a  5 letter organism prefix + strain name _ ‘SC’XXXX,  where SC stands for super Contig and XXXX are the 4 digit representation of the scaffold number.  Example: PytulBR144_SC1841, PhysoP6497_SC0001, PhyraPr102_SC0008, PhycaLT1534_SC0024, PhyinT30-4_SC0007, HyaarEmoy2_SC0165 etc.

Search Options:



Figure - 1

In the main FungiDB page, there are three search columns available(Figure 1). The search results of these columns return different result types e.g; the first column almost always returns gene records page, the second column returns other data types and the third column has links to tools such as blast, genome browser etc. On the top right hand corner of the page, there are quick search options available  for search by gene_id and by gene product name.

Few of the Search strategies are listed below:

1.       Curate all infection related genes in Phytophthora sojae as described in the science paper (Tyler et al, 2006) with key word searches such as serine proteases, Metalloproteases, Cysteine proteases, glycosyl hydrolases, pectinesterases, pectate lyases, cutinaes, chitinases, lipases, phospholipases, protease inhibitors, NPP family, PcF family, Six Cys Family, Eight Cys family, Crn Family, nonribosomal peptide synthetases, Polyketide synthases, Cytochrome P450s, CYP51 clan, ABC transporters, PDR, ABCG-half, MDR, MRP, elicitins, Avhs, Crinklers.

Hint: Start your search by Text (product name, Gene Id) search for proteases. After the results are displayed, run another Text(Product name, Gene Id) search and use the next item in your gene list  e.g; glycosyl hydrolases. Choose union of both searches and keep repeating this until you are done with your list. Following is the strategy for curating all the infection related genes in Phytophthora sojae.
2.       Find all the Oomycetes proteins that have an RXLR motif within first 20-50 residues , followed by a dEER motif within 20,60 residues of the first motif. Also see if they have predicted signal peptide leaders and transmembrane domains. (start with similarity-> protein motif search and use the following pattern ^.{20,50}R.LR.{20,60}DEER. Then add a step to it on whether the protein has a signal peptide followed. Then add another step on if the secrted proteins have a transmembrane domain. The stratergy can be found here:

3.       Genomic locations: All the Oomycetes genomes are draft sequences, so searching genomic locations can be little tricky. Click on genome location search (as in figure 2 A)  and this will open a page where the pull down menu only displays the genomes that have the complete chromosomes available. Instead, choose genomic sequence id (as in figure 2 B) and fill up the start and stop locations you are interested in. Run the query and go to add step. If you are interested in seeing how many of these genes have orthologs in other oomycetes, just click on add step and click on Eolution->orthology, Phylogenetic profile and choose Oomycetes genomes (Figure 2 C). This results in 32 genes belonging to 25 ortholog groups. Clicking on the ortholog groups and it will display a cascade of glycosyl Hydrolases (Figure 2 D) belonging to different Oomycetes. From the gene_ids, it appears that the genes occur in a syntenic block.
Strategy can be found here: http://fungidb.org/fungidb/im.do?s=dce0a3dacc71bb06