What is the difference between “likelihood” and “probability”?

Probability:

Consider an experiment whose sample space is S. For each event E of the sample space S we assume that a number P(E) is defined and satisfies the following three axioms.

Axiom 1.  

Screen Shot 2018-01-22 at 9.10.08 PM.png

Axiom 2.  

Screen Shot 2018-01-22 at 9.18.23 PM.png

Axiom 3. For any sequence of mutually exclusive events E1, E2, ...

Screen Shot 2018-01-22 at 9.21.14 PM.png

We refer to P(E) as the probability of the event E.

Likelihood:

The distribution of a parameter before observing any data is called the prior distribution of the parameter. The conditional distribution of the parameter given the observed data is called the posterior distribution. If we plug the observed values of the data into the conditional p.f. or p.d.f. of the data given the parameter, the result is a function of the parameter alone, which is called the likelihood function.

Probability and Statistics, Fourth Edition, M.H. DeGroot and M. J. Schervish

Likelihood is NOT probability, because likelihood violates the three axioms. For example, we observe x=5 for an exponential distribution with an unknown parameter p. Then the support of the likelihood function is p>0. If the likelihood function is probability, then we must have the following expression

Screen Shot 2018-01-23 at 5.38.45 PM.png

equals to 1. However, the expression gives 0.04.

Submitted by TaeYoung