genomics: March 2009

Monday, March 30, 2009

Fungus farmers show way to new drugs

[I came across this interesting story in nature news, thought I will put it in my blog:]

Ant colonies could be key to advances in biofuels and antibiotics.

-Erika Check Hayden

Studies of bacteria on leaf-cutting ants could yield new antibiotics.Studies of bacteria on leaf-cutting ants could yield new antibiotics.M. MOFFETT/FLPA

In a mutually beneficial symbiosis, leaf-cutting ants cultivate fungus gardens, providing both a safe home for the fungi and a food source for the ants. But this 50-million-year-old relationship also includes microbes that new research shows could help speed the quest to develop better antibiotics and biofuels.

Ten years ago, Cameron Currie, a microbial ecologist then at the University of Toronto in Ontario, Canada, discovered that leaf-cutting ants carry colonies of actinomycete bacteria on their bodies (C. R. Currie et al. Nature 398, 701–704; 1999). The bacteria churn out an antibiotic that protects the ants' fungal crops from associated parasitic fungi (such as Escovopsis). On 29 March, Currie, Jon Clardy at the Harvard Medical School in Boston and their colleagues reported that they had isolated and purified one of these antifungals, which they named dentigerumycin, and that it is a chemical that has never been previously reported (D.-C. Oh et al. Nature Chem. Bio. doi: 10.1038/nchembio.159; 2009). The antifungal slowed the growth of a drug-resistant strain of the fungus Candida albicans, which causes yeast infections in people.

“These ants are walking pharmaceutical factories.”

Because distinct ant species cultivate different fungal crops, which in turn fall prey to specialized parasites, researchers hope that they will learn how to make better antibiotics by studying how the bacteria have adapted to fight the parasite in an ancient evolutionary arms race. "These ants are walking pharmaceutical factories," says Currie, now at the University of Wisconsin, Madison.

That's not the end to the possible applications. The ant colonies are also miniature biofuel reactors, Currie reported on 25 March at the Genomics of Energy & Environment meeting at the Joint Genome Institute in Walnut Creek, California. Each year, ants from a single colony harvest up to 400 kilograms of leaves to feed their fungal partners. But no one has worked out how the fungi digest the leaves, because samples of fungus grown in petri dishes can't break down cellulose, a tough molecule found in plant cells. Researchers are keenly interested in better ways to break down cellulose, because it might allow them to make more efficient biofuels than those made from sugary foods, such as maize (corn).

So Currie and his colleagues sequenced small segments of DNA from bacteria and other organisms living in fungus gardens in three Panamanian leaf-cutting ant colonies. They then compared the DNA against databases to identify what species were living in the fungus gardens, and what genes they contained.

This 'metagenomics' approach found that there are many species of bacteria in the fungus gardens that are capable of breaking down cellulose. The team also detected the genetic signatures of fungal enzymes that can break down cellulose, which raises the question of why the fungi can't break down cellulose in the laboratory.

Currie suggests that the newfound bacterial and fungal enzymes might be efficient at digesting cellulose because they have evolved for centuries along with the ant-fungal symbiosis. This could mean that the fungus can only break down cellulose in its natural context, or that the enzymes Currie detected are brought into the colony from outside. "The idea is that the ants' long evolutionary history may help us in our own attempts to break down plant biomass," he says.

Other researchers call Currie's findings interesting, but say they wanted to see a more thorough analysis of the data. "It's interesting that he found these fungal enzymes in the gardens that he didn't expect [based on] what the fungus was capable of doing by itself," says John Taylor, a mycologist at the University of California, Berkeley.

Taylor says that Currie's continued scrutiny of the lives of ants provides insights into the web of interactions necessary for the survival of any single species. "I think the coolest thing about this is that you start with one organism, and then you find more and more organisms involved in the relationship," he says. It may take a village to raise a child; it seems it also takes a village to break down cellulose.

Thursday, March 26, 2009

Using galaxy server

I recently came across galaxy server for genome sequence manipulation. While I do large scale computing, I myself have thought several times to create a web based tool for large scale sequence manipulation, that will be able to handle huge data files. My requirements would often be to subtract sequences from one file to another, fetch sequences from huge fasta file containing names from a list file, find duplicate triplicate sequences, find sequences of a certain length etc. I think the galaxy server provides many of that through a web server. The link is here

creating a wiki page for community portal

I have been toying with the idea of creating a wiki for our VMD web portal. The rationale is to create a user document in a wiki, which will also be used for bug tracking versioning etc. I am now looking at freely available tools/softwares for wiki creation( a compiled version is available here ).

For me documentation in wiki could be a great idea, since users can edit this page at ease and complement each others observation, instead of one person writing a whole document. Docuwiki seem to be the simplest one without the intervention of database. It just handles files and so I would assume is much safer option for our servers.

Monday, March 09, 2009

Find pattern using grep (egrep)

One of the quickest and easiest ways to find patterns in a huge file without writing any scripts is through unix grep. I find it extremely useful to fetch strings having a certain pattern.

Looking for Tabs:

Many manuals don't say a lot about finding a tab character in a pattern. If you have a pattern "gene TAB exon", then the way to look for the file would be:

grep 'gene[[:blank:]]exon'

Monday, March 02, 2009

Implementing binary search in searching a list of sequences from a complex file

In sequence analysis work, it becomes imperative to search a list of sequences from a sequence file or a file having complex sets of data. One such complex dataset I can think of is an ACE file. An ACE file has multiple components:


AS contigs reads
          CO contig_name bases reads segments compl (CAP3: segments=0)
          sequence
          BQ base_qualities
          AF read1 compl padded_start_consensus (negatives meaning?)
          AF read2 ..
          BS segments
          RD read1 bases info_items info_tags (latter two set to 0 by CAP3)
          sequence
          QA read1 qual_start qual_end align_start align_end
          DS (phred header? left empty by CAP3)
          RD read2 ...

Suppose you have a list of sequences in a file and you got to search which sequence belongs to which Contig from the ACE file. Then probably, you have to read the ACE file first as an hash of arrays, something like this:


while(){

    if(/^CO Contig/){

        $key = $_;
        $flag = 1;
    }

    if($flag && /^[ATGC][ATGC][ATGC]/){

        chomp;
        $seq.=$_;
    }

    if(/^BQ/){

        push(@arr,$seq);
        $seq = '';

    }

    if(/^AF\s+(\S+)/){

        push(@arr,$1);

    }

    if(/^BS/ && $flag){

        $HOA{$key} = [ @arr ];
        $flag = 0;
        @arr = ();

    }

  
}

After reading ACE into hash of arrays, need to iterate over the hash and get the sequence of arrays searched against each entry in the file. If the sequence file size is hugh, its good to implement a binary search(O(log2N)) instead of a serial search(O(N)). But knowing the fact that binary search can be implemented on a sorted array or list, it is more probable to sort the list file instead of sorting the array of hash. Because once sorted the list file stays sorted and does not need any sorting.

Implementation:

my @list = sort(@list); # sorting the sequence names in list file


# The following implementation is a linear search and takes 
# much longer time
foreach my $key(keys %HOA){

    my @arr =  @{ $HOA{$key}} ;
   
    my @found;

    for(my $i=1;$i<=$#arr;$i++){
           
        for(my $j=0;$j<=$#list;$j++){
            chomp($list[$j]);
           
                if($arr[$i] =~ $list[$j]){
                
                push(@found, $arr[$i]);

                splice(@list,$j,1); # removing the 
                                    # element from the list
 
                last;
                }
        }
    }

 
Binary search:

for(my $i=1;$i<=$#arr;$i++){

   push(@found, bsearch($arr[$i],\@list);

}


sub bsearch {
    my ($x, $a) = @_;            # search for x in array a
    my ($l, $u) = (0, @$a - 1);  # lower, upper end of search interval
    my $i;                       # index of probe
    while ($l <= $u) {
 $i = int(($l + $u)/2);
 print($i, "\n");
 if ($a->[$i] < $x) {
     $l = $i+1;
 }
 elsif ($a->[$i] > $x) {
     $u = $i-1;
 } 
 else {
     return $i; # found
 }
    }
    return -1;         # not found
}