# Lab 5, Exercise 1: Airline Meal Part 2

We now have access to the full details of the survey carried out by the airline, with the number of customers who chose a 'beef' type meal in a single 219-seat flight, $n_{b1}$ (for simplicity, we've created the data such that the number of customers in each flight is a constant). We will use this new data to assess the quality of the model inferred in part 1.

Use the posterior probability distribution from part 1 to make a posterior predictive test, using the standard deviation in $n_{b1}$ as the test quantity:

$T(n_{b1}) \equiv \sqrt{\dfrac{1}{n_{flight}}\sum_{i=1}^{n_{flight}} (n_{b1,i} - \bar{n_{b1}})^2}$

where $n_{b1,i}$ is the number of customers on the $i$-th flight of the survey who chose beef, and $\bar{n_{b1}}$ is the average $n_{b1}$ in the survey.

Under the binomial distribution model inferred in part 1, what is the probability of obtaining a more extreme value of $T(n_{b1})$?

Discuss the accuracy of the model based on the answer to the question above.

## Solution

Let's first calculate the test quantity for the observed data:

In [None]:
import numpy as np


# load the survey data
f = open('meal_survey.txt', 'r')
nb1_obs, nc1_obs = np.loadtxt(f, dtype=int, unpack=True)
f.close()

nflight = len(nb1_obs)

nb = nb1_obs.sum()
nc = nc1_obs.sum()

# calculates the standard deviation in nb1_obs
Tobs = nb1_obs.std()
print 'Tobs = %2.1f'%Tobs

Given $p_{beef}$, the parameter of our binomial model describing the probability of a random customer to choose beef, the posterior probability distribution for $p_{beef}$ found in Part 1 is proportional to a Beta distribution:

$P(p_{beef}|n_b,n_c) \propto p_{beef}^{n_b}(1-p_{beef})^{n_c} \propto \rm{Beta}(n_b+1,n_c+1)$

Let's simulate new survey data from our model as follows:
- Draw a value of $p_{beef}$ from the posterior
- For $n_{flight}$ flights, generate 219 customer choices drawn from a binomial distribution with probability $p_{beef}$. Record the value of $n_{b1}$ for each simulated flight, then calculate $T(n_{b1})$.
- Repeat for a large number of iterations
- Count the fraction of times for which the simulated $T$ is more extreme than the observed one.

In [None]:
import pylab


nsim = 1000

nseat = 219

Tsim = np.zeros(nsim)
for i in range(nsim):
 # draw a value of p_beef from the posterior
 p_beef_here = np.random.beta(nb+1, nc+1, 1)
 
 nb1_mock = np.zeros(nflight, dtype=int)
 # loop over 20 flights
 for j in range(nflight):
 x = np.random.rand(nseat)
 nb1_mock[j] = (x < p_beef_here).sum()
 
 Tsim[i] = nb1_mock.std()

print (Tsim > Tobs).sum()/float(nsim)

# plot histogram of simulated Tsim vs. observed one
pylab.hist(Tsim)
pylab.axvline(Tobs, linestyle='--', color='k')
pylab.show()


The standard deviation in $n_{b1}$ observed in the survey is around 12. It's very unlikely to obtain such a large value, based on the posterior prediction from our model. This suggests that the model is not entirely accurate.

(The survey data has been generated by assuming two different values of $p_{beef}$ depending on the flight. This could be the case if people's preference varies with the time of the day, or with place of departure)