{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Lab 4, Exercise 2: the stellar mass-size relation\n",
    "\n",
    "We have a sample of galaxies with measurements of their stellar mass $M_*$ and half-light radius $R_e$. Model the distribution of (log) stellar masses as a Gaussian and the distribution in $\\log{R_e}$ as a Gaussian with mean that scales with $M_*$:\n",
    "\n",
    "$P(\\log{M_*}|\\mu_*,\\sigma_*) \\sim \\mathcal{N}(\\mu_*, \\sigma_*^2)$\n",
    "\n",
    "$P(\\log{R_e}|\\log{M_*},\\mu_R, \\beta_R, \\sigma_R) \\sim \\mathcal{N}(\\mu_R + \\beta_R(\\log{M_*} - \\log{M_*^p}), \\sigma_R^2)$\n",
    "\n",
    "Assume Gaussian errors $\\epsilon_{*,i}$ on the log-stellar mass measurements and no errors on the sizes ($R_e$ are measured exactly). Infer the posterior probability distribution of the hyper-parameters $\\mu_*,\\sigma_*,\\mu_R,\\beta_R,\\sigma_R$ given the data. Draw samples from the posterior and find, for each hyper-parameter, the posterior median and 68% enclosed probability bounds.\n",
    "\n",
    "Make a scatter plot of the observed values of $\\log{R_e}$ as a function of $\\log{M_*}$. Overplot the function $\\log{R_e} = \\mu_R + \\beta_R (\\log{M_*} - \\log{M_*^p})$, with the values of $\\mu_R$ and $\\beta_R$ inferred from the MCMC (you can use the values that maximize the posterior probability, or the median $\\mu_R$ and $\\beta_R$ of the respective marginal posteriors).\n",
    "\n",
    "*(Optional):* make 2D contour plots for each pair of hyper-parameters."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Solution\n",
    "\n",
    "Let's simplify notations:\n",
    "\n",
    "$m_i \\equiv \\log{M_{*,i}}$ := true log-stellar mass of galaxy i\n",
    "\n",
    "$r_i \\equiv \\log{R_e}$ := true log-half-light radius of galaxy i (also \n",
    "\n",
    "$m_i^{(obs)} \\equiv \\log{M_{*,i}^{(obs)}}$ := observed log-stellar mass of galaxy i\n",
    "\n",
    "$m^p \\equiv \\log{M_*^p}$ := pivot stellar mass of the mass-size relation (to be assigned)\n",
    "\n",
    "As shown in class, the posterior probability distribution for this problem is\n",
    "\n",
    "$P(\\mu_*,\\sigma_*,\\mu_R,\\beta_R,\\sigma_R|d) \\propto \\dfrac{1}{|\\beta_R|^n} \\prod_i \\sqrt{\\dfrac{\\sigma_{eff*,i}^2}{\\sigma_*^2\\epsilon_{*,i}^2\\sigma_{R,i}^2/\\beta_R^2}} \\exp{\\left\\{-\\dfrac12\\left(\\dfrac{\\mu_*^2}{\\sigma_*^2} + \\dfrac{m_{(obs),i}^2}{\\epsilon_{*,i}^2} + \\dfrac{m_{R,i}^2}{\\sigma_{R,i}^2/\\beta_R^2} - \\dfrac{\\mu_{eff*,i}^2}{\\sigma_{eff*,i}^2}\\right)\\right\\}}$\n",
    "\n",
    "where\n",
    "\n",
    "$\\sigma_{eff*,i} \\equiv \\left(\\dfrac{1}{\\sigma_*^2} + \\dfrac{1}{\\epsilon_{*,i}^2} + \\dfrac{\\beta_R^2}{\\sigma_R^2}\\right)^{-1}$\n",
    "\n",
    "$\\mu_{eff*,i} \\equiv \\left(\\dfrac{\\mu_*}{\\sigma_*} + \\dfrac{m_{(obs),i}}{\\epsilon_{*,i}^2} + \\dfrac{m_{R,i}}{\\sigma_R^2/\\beta_R^2}\\right)\\sigma_{eff*,i}^2$\n",
    "\n",
    "$m_{R,i} \\equiv \\dfrac{r_i - \\mu_R}{\\beta_R} + m^p$\n",
    "\n",
    "Let us draw samples from this 5-dimensional distribution using the Metropolis algorithm.\n",
    "Let us use a 5-dimensional Gaussian jump distribution\n",
    "\n",
    "$$\n",
    "J_t(\\theta*,\\theta) = \\mathcal{N}\\left((\\theta), \\left(\\begin{array}{ccccc} \n",
    "s_1^2 & 0 & 0 & 0 & 0 \\\\ \n",
    "0 & s_2^2 & 0 & 0 & 0\\\\\n",
    "0 & 0 & s_3^2 & 0 & 0\\\\\n",
    "0 & 0 & 0 & s_4^2 & 0\\\\\n",
    "0 & 0 & 0 & 0 & s_5^2\n",
    "\\end{array}\\right)\\right)\n",
    "$$\n",
    "\n",
    "where $s_1,\\ldots,s_5$ are the step sizes in the direction of each parameter.\n",
    "\n",
    "Let's set $s_1 = s_2 = s_3 = s_4 = s_5 = 0.03$ for simplicity.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "ename": "SyntaxError",
     "evalue": "Missing parentheses in call to 'print'. Did you mean print('sampling...')? (<ipython-input-1-f64b3a9784b8>, line 37)",
     "output_type": "error",
     "traceback": [
      "\u001b[0;36m  File \u001b[0;32m\"<ipython-input-1-f64b3a9784b8>\"\u001b[0;36m, line \u001b[0;32m37\u001b[0m\n\u001b[0;31m    print 'sampling...'\u001b[0m\n\u001b[0m                      ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m Missing parentheses in call to 'print'. Did you mean print('sampling...')?\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "import pylab\n",
    "from scipy.stats import multivariate_normal\n",
    "\n",
    "\n",
    "# load the data\n",
    "f = open('SLACS_table.cat', 'r')\n",
    "mstar, mstar_err, size = np.loadtxt(f, usecols=(3, 4, 5), unpack=True)\n",
    "f.close()\n",
    "\n",
    "n = len(mstar) # number of objects\n",
    "\n",
    "mobs = mstar\n",
    "robs = np.log10(size)\n",
    "eps_star = mstar_err\n",
    "\n",
    "mpiv = np.median(mstar) # set the value of the pivot stellar mass to the median of the sample\n",
    "\n",
    "# define log-posterior probability\n",
    "def logp_func(theta):\n",
    "    \n",
    "    mu_star, sigma_star, mu_R, beta_R, sigma_R = theta\n",
    "    \n",
    "    sumlogp = -n*np.log(abs(beta_R))\n",
    "    for i in range(n): # loops over the galaxies in the sample\n",
    "        mR = (robs[i] - mu_R)/beta_R + mpiv\n",
    "        sigma_star_eff = (1./sigma_star**2 + 1./eps_star[i]**2 + beta_R**2/sigma_R**2)**(-0.5)\n",
    "        mu_star_eff = (mu_star/sigma_star**2 + mobs[i]/eps_star[i]**2 + mR*beta_R**2/sigma_R**2)*sigma_star_eff**2\n",
    "        \n",
    "        sumlogp += np.log(sigma_star_eff/(sigma_star*eps_star[i]*sigma_R/abs(beta_R)))\n",
    "        sumlogp += -0.5*(mu_star**2/sigma_star**2 + mobs[i]**2/eps_star[i]**2 + mR**2*beta_R**2/sigma_R**2 - mu_star_eff**2/sigma_star_eff**2)\n",
    "\n",
    "    return sumlogp\n",
    "\n",
    "# defines sampler: takes starting point, jump covariance matrix and number of steps, returns array of samples.\n",
    "def run_sampler(theta_0, jump_cov, nstep):\n",
    "    print 'sampling...'\n",
    "    theta_samp = np.zeros((len(theta_0), nstep))\n",
    "    logp_samp = np.zeros(nstep)\n",
    "    \n",
    "    theta_samp[:, 0] = theta_0\n",
    "    logp_samp[0] = logp_func(theta_0)\n",
    "    \n",
    "    naccept = 0\n",
    "    \n",
    "    for i in range(1, nstep):\n",
    "        # proposes step from the jump distribution\n",
    "        theta_old = theta_samp[:, i-1]\n",
    "        theta_star = multivariate_normal.rvs(mean=theta_old, cov=jump_cov)\n",
    "        \n",
    "        logp_new = logp_func(theta_star)\n",
    "        \n",
    "        accept = False\n",
    "        if logp_new > logp_samp[i-1]:\n",
    "            accept = True\n",
    "        else:\n",
    "            r = np.exp(logp_new - logp_samp[i-1])\n",
    "            x = np.random.rand(1)\n",
    "            if x < r:\n",
    "                accept = True\n",
    "            else:\n",
    "                accept = False\n",
    "            \n",
    "        if accept:\n",
    "            naccept += 1\n",
    "            theta_new = theta_star\n",
    "            logp_samp[i] = logp_new\n",
    "        else:\n",
    "            theta_new = theta_old\n",
    "            logp_samp[i] = logp_samp[i-1]\n",
    "    \n",
    "        theta_samp[:, i] = theta_new\n",
    "    \n",
    "    print 'acceptance rate: %3.2f'%(naccept/float(nstep))\n",
    "    return (theta_samp, logp_samp)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# set starting point\n",
    "theta_0 = np.zeros(5)\n",
    "theta_0[0] = np.median(mstar)\n",
    "theta_0[1] = mstar.std()\n",
    "theta_0[2] = np.median(robs)\n",
    "theta_0[3] = 0.5\n",
    "theta_0[4] = 0.2\n",
    "\n",
    "# set covariance matrix of jump distribution\n",
    "jump_cov = 0.03**2 * np.diag(np.ones(5)) \n",
    "\n",
    "# run the MCMC\n",
    "theta_samp, logp_samp = run_sampler(theta_0, jump_cov, nsamp)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# visually inspect the chain\n",
    "\n",
    "pylab.plot(theta_samp[0, :])\n",
    "pylab.ylabel('$\\mu_*$')\n",
    "pylab.show()\n",
    "\n",
    "pylab.plot(theta_samp[1, :])\n",
    "pylab.ylabel('$\\sigma_*$')\n",
    "pylab.show()\n",
    "\n",
    "pylab.plot(theta_samp[2, :])\n",
    "pylab.ylabel('$\\mu_R$')\n",
    "pylab.show()\n",
    "\n",
    "pylab.plot(theta_samp[3, :])\n",
    "pylab.ylabel('$\\\\beta_R$')\n",
    "pylab.show()\n",
    "\n",
    "pylab.plot(theta_samp[4, :])\n",
    "pylab.ylabel('$\\sigma_R$')\n",
    "pylab.show()\n",
    "\n",
    "pylab.plot(logp_samp)\n",
    "pylab.ylabel('$\\log{(P)}$')\n",
    "pylab.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's run another chain, but update the covariance matrix of the jump distribution to set it equal to that of the sample. Let's also increase the number of steps."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "new_cov = np.cov(theta_samp)\n",
    "\n",
    "# let's initialize the new chain to the last value of the previous one\n",
    "theta_0 = theta_samp[:, -1]\n",
    "\n",
    "print 'starting point:\\n', theta_0\n",
    "\n",
    "nsamp = 50000\n",
    "\n",
    "# runs the MCMC\n",
    "theta_samp, logp_samp = run_sampler(theta_0, new_cov, nsamp)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# visually inspect the chain\n",
    "\n",
    "pylab.plot(theta_samp[0, :])\n",
    "pylab.ylabel('$\\mu_*$')\n",
    "pylab.show()\n",
    "\n",
    "pylab.plot(theta_samp[1, :])\n",
    "pylab.ylabel('$\\sigma_*$')\n",
    "pylab.show()\n",
    "\n",
    "pylab.plot(theta_samp[2, :])\n",
    "pylab.ylabel('$\\mu_R$')\n",
    "pylab.show()\n",
    "\n",
    "pylab.plot(theta_samp[3, :])\n",
    "pylab.ylabel('$\\\\beta_R$')\n",
    "pylab.show()\n",
    "\n",
    "pylab.plot(theta_samp[4, :])\n",
    "pylab.ylabel('$\\sigma_R$')\n",
    "pylab.show()\n",
    "\n",
    "pylab.plot(logp_samp)\n",
    "pylab.ylabel('$\\log{(P)}$')\n",
    "pylab.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This chain looks healthy. Now let's plot the observed mass-size relation with our best-fit measurement. For simplicity, we're going to fix $\\mu_R$ and $\\beta_R$ to the maximum of the posterior distribution."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# take the maximum of the posterior\n",
    "\n",
    "\n",
    "theta_mp = theta_samp[:, logp_samp.argmax()]\n",
    "\n",
    "mu_R_mp = theta_mp[2]\n",
    "beta_R_mp = theta_mp[3]\n",
    "\n",
    "pylab.scatter(mobs, robs)\n",
    "xlim = pylab.xlim()\n",
    "xs = np.linspace(xlim[0], xlim[1])\n",
    "pylab.plot(xs, mu_R_mp + beta_R_mp*(xs - mpiv), linestyle='--', color='k')\n",
    "pylab.xlabel('$\\log{M_*}$')\n",
    "pylab.ylabel('$\\log{R_e}$')\n",
    "pylab.show()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's plot the probability contours for each pair of hyper-parameters of the model.\n",
    "First let's write a function that creates a single 2D contour plot, then let's write a script that displays the plots in an organized way."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from scipy import ndimage\n",
    "\n",
    "\n",
    "def probcontour(xarr, yarr, color='b', smooth=2, bins=100):\n",
    "    # create a 2d histogram\n",
    "    H, xbins, ybins = pylab.histogram2d(xarr, yarr, bins=bins)\n",
    "    \n",
    "    # smooths the histogram\n",
    "    H = ndimage.gaussian_filter(H, smooth)\n",
    "    \n",
    "    # create an array containing the number of points in each histogram bin, sorted.\n",
    "    sortH = np.sort(H.flatten())\n",
    "    \n",
    "    # compute the cumulative sum\n",
    "    cumH = sortH.cumsum()\n",
    "    \n",
    "    # find the level for which the integral (i.e. sum) over the bins with a higher number of samples is 68% of the total\n",
    "    lvl68 = sortH[cumH>0.32*cumH[-1]][0]\n",
    "    \n",
    "    # same for 95%-ile level\n",
    "    lvl95 = sortH[cumH>0.05*cumH[-1]][0]\n",
    "    \n",
    "    pylab.contourf(H.T, [lvl95, lvl68], colors=color, alpha=0.5, extent=(xbins[0], xbins[-1], ybins[0], ybins[-1]))\n",
    "    pylab.contourf(H.T, [lvl68, 1e8], colors=color, alpha=0.9, extent=(xbins[0], xbins[-1], ybins[0], ybins[-1]))\n",
    "\n",
    "fig = pylab.figure()\n",
    "pylab.subplots_adjust(left=0.1, right=0.99, bottom=0.1, top=0.99)\n",
    "\n",
    "npars = 5\n",
    "labels = ['$\\mu_*$', '$\\sigma_*$', '$\\mu_R$', '$\\\\beta_R$', '$\\sigma_R$']\n",
    "\n",
    "for i in range(npars):\n",
    "    ax = fig.add_subplot(npars, npars, (npars+1)*i + 1)\n",
    "    pylab.hist(theta_samp[i, :], histtype='step', color='b')\n",
    "    \n",
    "for j in range(1, npars):\n",
    "    for i in range(j):\n",
    "        ax = fig.add_subplot(npars, npars, npars*j+i+1)\n",
    "        probcontour(theta_samp[i, :], theta_samp[j, :])\n",
    "        \n",
    "        if i == 0:\n",
    "            ax.set_ylabel(labels[j])\n",
    "        if j == npars-1:\n",
    "            ax.set_xlabel(labels[i])\n",
    "            \n",
    "pylab.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}