Thursday, March 25, 2010

Packaging your source code

I have written down few software packages but never released any of them officially into the nicer ./configure && make && make install format.

We have created a C package that does what GCRMA would do plus much more in a very memory efficient way. Although GCRMA C source is available for free consumption, it needed quite some work on our end to customize it. Now that it is already written down, I am settling for releasing it under GNU open source license.

Few things need to be taken care of before creating the package:

1. Put all your C sources under src/ directory
2. All data such as sample CEL files need to go to /data directory
3. Accessory scripts like split file, merge file, plotting with R need to go to script/ directory
4. Documents go to /doc directory

RUN CONFIGURATION UTILITY:

First run autoscan

With all likelihood this command will exit with error. Nevertheless, it produces a configure.scan file.
Open configure.scan file and edit the line with
AC_INIT(PACKAGE NAME, VERSION, CONTACT EMAIL)
into useful inputs like
AC_INIT(Modified GCRMA, 1.0, tsucheta@gmail.com)

$ mv configure.scan configure.ac(Note configure.in used to be the earlier version)

$autoconf

This will create the configure file.

Makefile.am

You need to create a series of makefile.am files each inside your /script /man/ doc/ bin/ directories with appropriate values.

In your src/ directory you may like to keep useful information like:

# what flags you want to pass to the C compiler & linker
AM_CFLAGS = --pedantic -Wall -std=c99 -O2
AM_LDFLAGS =

# this lists the binaries to produce, the (non-PHONY, binary) targets in
# the previous manual Makefile


#bin_SCRIPTS = scripts/split.pl scripts/merge.pl scripts/plot.R
bin_PROGRAMS = Modified GCRMA
loglikelihood_SOURCES = file1.c file2.c file3.c file4.c file5.c...

GCRMA_LDADD = -lm -lz
main.o: main.c utility.h
cc -c main.c
calculate.o: calculate.c utility.h
cc -c calculate.c
read_seq.o: read_seq.c utility.h
cc -c read_seq.c
read_file.o: read_file.c utility.h
cc -c read_file.c
rev_complement.o: rev_complement.c utility.h
cc -c rev_complement.c
detect_chimera.o: detect_chimera.c utility.h
cc -c detect_chimera.c
find_orf.o: find_orf.c utility.h
cc -c find_orf.c
command.o: command.c utility.h
cc -c command.c
process.o: process.c utility.h
cc -c process.c
clean:
rm -f *.o

in scripts directory change makefile.am into
bin_SCRIPTS = file1.pl file2.pl file3.R file4.sh....

in Man directory change
man_MANS = man.1 man.2 man.3...

Now go back to configure.ac file and make changes to the line just after

AC_INIT()
into
AM_INIT_AUTOMAKE(ModifiedGCRMA, 1.0). This will initialize automake

And at the bottom of the configure.ac file make the following changes

AC_OUTPUT(Makefile src/Makefile doc/Makefile man/Makefile scripts/Makefile)

Now run aclocal followed by automake --add-missing

automake will read makefile.am and create a makefile.in file for configure to create a final make file.

While reading automake be cautioned - you may be asked for files
NEWS,README,AUTHORS,ChangeLog

Never mind you can create those files using
touch NEWS
touch README
touch AUTHORS
touch ChangeLog

If automake did not generate makefile.in the previous time, run it again.

you may like to build the configure script again using autoconf.

Once this is done, you are all set. You may run ./configure --prefix-path="YOUR_PATH"

followed by make make install

You may pack your stuff using command make dist.

Here few of the autoconf macros are listed.

Thursday, March 04, 2010

A reasonable list of free and paid softwares for nextgen sequence analysis

http://www.oxfordjournals.org/our_journals/bioinformatics/nextgenerationsequencing.htmlA great list on free and paid softwares for nextgen sequence analysis is available at the seqanswers forum here.

To browse through a compiled list of Bioinformatics Softwares click here.

A list of publications devoted at the Bioinformatics journal here
Now I am working on samtools and bowtie and will write a detailed review soon...

Tuesday, March 02, 2010

AGBT 2010

AGBT(Advances in Genome Biology and Technology) was recently concluded at Florida(24-27th Feb 2010). Anthony Fejas has done a great job in putting the outlines of the presentations in his blog . Check out for details.