genomics: Bayesian Probability theory and its application

The two probability theories, that are considered as pillars of all other complicated probabilistic models are; Addition theory and Multiplication theory:

Addition Theory(OR Theory):
If two events are mutually exclusive, then the possibility of occurrence of event 1 or event 2 is the addition of two probabilities.
Example: Possibility of having an ace or a joker from a pack of 54 cards. There are only 2 jokers and 4 aces. So, the probability is: 2/54 + 4/54 = 6/54 = 1/9.

Addition theory is slightly modified where the events are not mutually exclusive. For example, having a diamond and a queen from a pack of 52 cards .Symbolically, it is represented as: P(A or B) = P(A) + P(B) - P(A and B). So, the result will be 13/52 + 4/52 - 1/52.(Reason: there are 13 diamonds, 4 queens and only one queen in 52 cards). The result therefore is: 16/52= 4/13.

Multiplication Theory: (AND Theory)
When two events A and B are mutually exclusive, then the possibility of occurrence of A and B is P(A) X P(B).
Example: Tossing two coins simultaneously to obtain two tails or two heads is 1/4 X 1/4 = 1/16.

Bayesian Probability:
Bayesian statistics or Bayesian probability is also called as conditional probability where two different conditions are evaluated jointly. Common example is a loaded die in a casino where there are 99% of the dies are honest but 1% of the dies are loaded. Where with the loaded die, there is a possibility of getting a 6 is 50% as against a honest die, where getting a 6 occurs 1/6th of the time. So, what is the probability that one gets 3 consecutive sixes?
In theory Bayesian statistics can be applied where:

The sample space is partitioned into a set of mutually exclusive events { A₁, A₂, . . . , A_n }.
Within the sample space, there exists an event B, for which P(B) > 0.
The analytical goal is to compute a conditional probability of the form: P( A_k | B ).
You know at least one of the two sets of probabilities described below.

P( A_k ∩ B ) for each A_k
P( A_k ) and P( B | A_k ) for each A_k.

So, in the above case, we would like to know what is the possibility(likelihood) that there are 3 consecutive sixes, if it were a loaded die.
P(D Loaded|3 sixes) =                                  P(3 sixes in Loaded die) * P(loaded die)
                                   P(3 sixes in Loaded die) * P(loaded die) + P(3 sixes in fair die) * P(Fair die)
   =    (0.5)³ * (0.01)
                                           (0.5)³ * (0.01) + (1/6)³ * 0.99
=    0.21

Same can be applied to test, what is the possibility that there are 3 consecutive sixes in a fair die:

(1/6)³ * 0.99/ [ (0.5)³ * (0.01) + (1/6)³ * 0.99 ] = 0.78
Meaning that there is a fair chance that you will get 3 sixes consecutively in a unloaded die than a loaded die.

Same probability theory can be applied for testing a case where the weather prediction channels have predicted about a sunny or a rainy day in a given day. If it rains 5 times a whole year and the weather prediction channel predicted about a rainy day for a particular day and the accuracy of the weather channel is about 90%, then we can calculate the possibility of it is raining the given day is:
(5/365) * 0.9 / [ (5/365) * 0.9 + (360/365) * 0.1] = 0.111

Bayesian statistics is often used with sequence analysis where we have a situation to judge whether a protein is extracellular or intracellular. These 2 situations are mutually exclusive i.e; A protein can either be extracellular or intracellular and having more number of cystein residues at certain location decide at a certain confidence level whether that is the case.
Another test case could be:

A rare genetic disease is discovered. Although only one in a million
people carry it, you consider getting screened. You are told that the genetic
test is extremely good; it is 100% sensitive (it is always correct if
you have the disease) and 99.99% specific (it gives a false positive result
only 0.01 % of the time). Using Bayes' theorem, explain why you might
decide not to take the test.

Nature is a tinkerer and not an inventor. New sequences are adapted from pre-existing sequences rather than invented de novo [Jacob 19771].

genomics

Monday, November 29, 2010

Bayesian Probability theory and its application

No comments: