Syllabus

The process of acquisition and the subsequent interpretation of data is one of the foundations of modern science. In order to ensure a steady progress in our knowledge of the Universe, it is essential to use statistical tools that ensure an optimal and accurate extraction of information from the data. This is a particularly difficult task in astronomy, where measurements are often very noisy. Bayesian statistics offers a wide range of analysis tools that can help us deal with many of the challenges posed by astronomical data. These tools are rapidly gaining popularity within the astronomical community. The course reviews some of the most popular applications of Bayesian statistics in astronomical problems. Students will learn how to make accurate inferences of model parameters from noisy multidimensional data, how to assess the goodness of fit of a model, and learn the computational tricks needed to apply these methods in practice. Examples based on real astronomical problems are used throughout the course.
Back to top

Lectures

  1. Introduction to Bayesian statistics, probability distributions, inference of parameters in 1D problems (Lecture video)
  2. Posterior prediction (Lecture video)
  3. 2D problems, introduction to Bayesian hierarchical inference (Lecture video)
  4. The linear regression problem, sampling probability distributions, Markov Chain Monte Carlo. (Lecture video)
  5. Posterior predictive tests, model selection, applications of bayesian hierarchical inference methods. (Lecture video)
Back to top

Labs

  1. Exercise 1.1. N data points are drawn from a Gaussian distribution with known dispersion and unknown mean. Assuming a flat prior, what is the expression for the posterior probability distribution on the mean parameter of the Gaussian distribution from which the data is generated? Pick values for the true mean and dispersion, simulate a large number of inferences, then calculate the fraction of times the true value of the mean lies within the 68% credible region of the inference.

    Exercise 1.2. Repeat Exercise 1.1, but now assuming a Gaussian prior on the mean of the distribution. Observe how the answer to the question above is sensitive to a) the mean and dispersion of the Gaussian prior, b) the number of data points.
    Solution

  2. Exercise 2.1. An airline serves two kinds of meal on their long-haul flights: beef or chicken. They want to know the average food preference of their customers and to optimize the number of meals to load on each flight accordingly. A survey on the choices of customers in previous flights reveals that, out of 4,380 customers, 1,807 people chose beef, while 2,573 people chose chicken. Let's model the probability of a future customer taken at random to choose beef as a stochastic process with probability p b e e f .
    1. Based on the supplied data, what is the posterior probability distribution of the parameter p b e e f ? Plot this distribution.
    2. Suppose, for simplicity, that we know the value of p b e e f . What is the smallest number of beef meals the airline should load on a 219 seat flight to be 99% sure that every customer who wants beef gets it?
    3. Relax the assumption of a known value of p b e e f and answer the question above while marginalizing over all possible values of p b e e f , as inferred in part 1.

    Solution.

  3. Exercise 3.1. Let P ( θ ) 1 / θ 5 exp [ - 1 / θ 2 ] be an un-normalized probability distribution defined on the interval [0, 3]. Draw samples from it using the inverse cumulative distribution function method. Solution.

    Exercise 3.2. Take the catalog of stellar mass measurements of the SLACS sample of lenses (here). Assuming a Gaussian model for the stellar mass distribution of the sample:
    1. Infer the posterior probability distribution of the mean and dispersion of the stellar mass distribution, assuming, for simplicity, that stellar masses are measured with no uncertainty. Make a contour plot of the 68% and 95% enclosed probability regions for the mean and dispersion parameters.
    2. Repeat the step above relaxing the assumption of perfect stellar mass measurements (use a hierarchical inference approach and marginalize over the paarmeters describing the stellar masses of individual galaxies).
    Solution.

  4. Exercise 4.1. Let P ( θ ) 1 / θ 5 exp [ - 1 / θ 2 ] be an un-normalized probability distribution. Draw samples from it using the Metropolis algorithm. Solution.

    Exercise 4.2. Take the catalog of stellar mass and size measurements of the SLACS sample of lenses (here). Assume a Gaussian model for the stellar mass distribution of the sample, a power-law relation between the average size of a galaxy and its stellar mass and a Gaussian distribution in log R e at fixed stellar mass. Assume that sizes are measured with no uncertainty.
    1. Draw samples from the posterior probability distribution of the hyper-parameters, marginalized over the stellar masses of the individual galaxies. Obtain, for each hyper-parameter, the median and 68% enclosed probability bounds of its marginal posterior distribution.
    2. Make a scatter plot with the observed values of log R e as a function of log M * . Overplot the model mass-size relation, with the values of the parameters fixed to the maximum posterior probability point.
    3. (Optional): make 2D contour plots for each pair of hyper-parameters.
    Solution.

  5. Exercise 5.1. Consider the model used to describe the airline meal problem, Exercise 2.1. We now have access to the full details of the survey carried out by the airline, with the number of customers who chose a 'beef' type meal in a each 219-seat flight, n b 1 , here. (for simplicity, we've created the data such that the number of customers in each flight is a constant). We will use this new data to assess the quality of the model inferred in Exercise 2.1. Use the posterior probability distribution from Exercise 2.1 to make a posterior predictive test, using the standard deviation in n b 1 as the test quantity:
    T ( n b 1 ) = 1 n b 1 i n f l i g h t ( n b 1 , i - n b 1 ¯ ) 2
    Under the binomial distribution model inferred in part 1, what is the probability of obtaining a more extreme value of T ( n b 1 ) ? Discuss the accuracy of the model based on the answer to the previous question. How can the model be improved? Solution.
Back to top