Consider an experiment whose sample space is S. For each event E of the sample space S we assume that a number P(E) is defined and satisfies the following three axioms.
Axiom 3. For any sequence of mutually exclusive events E1, E2, ...
We refer to P(E) as the probability of the event E.
The distribution of a parameter before observing any data is called the prior distribution of the parameter. The conditional distribution of the parameter given the observed data is called the posterior distribution. If we plug the observed values of the data into the conditional p.f. or p.d.f. of the data given the parameter, the result is a function of the parameter alone, which is called the likelihood function.
Probability and Statistics, Fourth Edition, M.H. DeGroot and M. J. Schervish
Likelihood is NOT probability, because likelihood violates the three axioms. For example, we observe x=5 for an exponential distribution with an unknown parameter p. Then the support of the likelihood function is p>0. If the likelihood function is probability, then we must have the following expression
equals to 1. However, the expression gives 0.04.
Submitted by TaeYoung