\chapter{Introduction}
\label{ch:intro}
Statistics has been developed since the past two centuries. Generally speaking, 19th century is Bayesian statistics and 20th century is frequentist statistics (Efron 2004). Frequentist approaches have dominated statistical theory and practice for most of the past century. Thanks to the fast development of computing facilities and new sampling techniques, in particular, Markov Chain Monte Carlo (MCMC) in the last two decades, Bayesian approach has become feasible and attracts scientists more and more attention in various applications.
\section{Baye's rule}
\label{sec:Bayes.rule}
Bayesian statistical conclusions about parameters $\btheta$, or unobserved data $\by$, are made in terms of \emph{probability} statements. These probability statements are conditional on the observed values of $\by$, which is denoted as $p(\btheta|\by)$, called posterior distributions of parameters $\btheta$. Bayesian analysis is a practical method for making inferences from data and prior beliefs using probability models for quantities we observe and for quantities which we wish to learn. Below are three general steps for Bayesian data analysis:
\be
\i Set up a full probability model $p(\by, \btheta)$, i.e., a joint probability distribution for all observable and unobservable quantities;
\i Condition on observed data, calculate and interpret the posterior distributions (i.e., the conditional probability distribution of unobserved quantities of interest, giving the observed data $p(\btheta|\by)$);
\i Evaluate the fit of the model and the implications of the resulting posterior distribution.
\ee
In order to make probability statements about $\btheta$ given $\by$, we must begin with a model providing a \emph{joint probability distribution} for $\btheta$ and $\by$. This joint probability can be written as the product of $p(\btheta)$ (\emph{prior distribution}) and $p(\by|\btheta)$ (\emph{sampling distribution}),
$$p(\btheta, \by)=p(\btheta)p(\by|\btheta)$$
Conditional probability $p(\btheta|\by)$ can be obtained by dividing both sides by $p(\by)$.
\beq\label{eq:Bayes.rule}
p(\btheta|\by)=\frac{p(\btheta)p(\by|\btheta)} {p(\by)}\propto p(\btheta)p(\by|\btheta)\propto \mbox{prior}\times \mbox{data information}
\eeq
The primary task of any specific application is to develop model $p(\btheta, \by)$ and perform necessary computations to summarize $p(\btheta|\by)$ in appropriate ways.
\section{Non-informative prior}
\label{sec:noninformative.prior}
Bayesian analysis requires prior information (see Section \ref{sec:Bayes.rule}), however sometimes there is no particularly useful information before data are collected. In these situations, priors with ``no information" are expected. Such priors are called \emph{non-informative priors} or \emph{vague priors}. In recent Bayesian literature, \emph{reference priors} are more popularly used for fidelity reason, because any priors do have information. Anyway, \emph{non-informative prior} is so called in the sense that it does not favor one value over another on the parameter space of the parameter(s) $\btheta$. Another reason to use non-informative priors is that one can connect the Bayesian modeling results with frequentist analysis.
The following presents some ways to construct non-informative priors.
\be
\i Intuitively, flat over the parameter space. For example:
\be
\i $X_i \sim N(\mu, \sigma^2), iid$ with $\sigma^2$ known. Then $p(\mu) \propto 1$.
\ee
\i Almost flat over the parameter space. In the last example, $p(\mu) \sim N(0, 10^6)$.
\i Due to distribution change through parameter transformation, flat distribution over one parameter may not be flat over its transformed parameter, e.g., if $\sigma^2 \sim $ uniform on $(0,100)$, then $p(\sigma) \propto \sigma$, not uniform. Jeffrey's prior, which is invariant under transformation, $p(\btheta) \propto [I(\btheta)]^{1/2}$ where $I(\btheta)$ is the expected Fisher information in the model. For example,
\be
\i $X \sim N(\mu, \sigma^2)$ with $\mu$ known. The Jeffrey's prior is $p(\sigma^2) \propto 1/\sigma^2$.
\i $X \sim Bin(n,\theta)$ with $n$ known. The Jeffrey's prior is $p(\theta) \propto \theta^{-1/2}(1-\theta)^{-1/2}$, which is $Beta(\fracs{2},\fracs{2})$.
\ee
\ee
For multiple parameters, the non-informative priors can be constructed by assuming ``independence" among the parameters. For example, $p(\theta_1, \theta_2)=p(\theta_1)p(\theta_2)$ and each prior on the right hand side is the univariate non-informative prior. We can also use multivariate version of Jeffrey's prior, $p(\btheta) \propto |I(\btheta)|^{1/2}$ where $| \cdot |$ denotes the determinant.
Note that non-informative prior may be improper, in that $\int p(\theta) d \theta=\infty$, but Bayesian inference is
still possible as long as it leads to proper posterior.
\section*{Reference}
\be
\i Efron B. (2004) Presidential address in JSM, 2004.
\ee