This notebook presents a technique for finding the maximum likely point of a multi-modal function while sampling it. Our function is a probability density function (pdf) and the method used is called simulated annealing, a Markov Chain Monte Carlo method (MCMC).
This method is inspired by the annealing process in solid state physics. Annealing is a physical process of heating a solid until thermal stresses are released, then cooling it very slowly until the crystals are perfectly arranged, acheiving a minimum energy state. Depending on how fast or slow the temperature is cooled, the results will have worse or better the quality.
Photo: Gavin Smith - https://bit.ly/2IzAkcE}
The simulation of this process is known as Simualted Annealing. It is a mathematical analogy which can be used to sample highly nonlinear functions, multidimensional functions. The analogy of the slow temperature cooling is the decrese on the probablitiy of accepting worse solutions as the space is explored; instead, the probability for going to a better solution is kept to 1. At high temperatures the system is allowed to explore the whole space, and at low temperatures exist a restriction in the exploration. The decrease of the temperature has to be done carefully, due to the aim is to find the global minimum state of energy. If it is done too fast you can obtain a secondary minima, and if it is too slow you will waste a lot of forward calculations.
This analogy between the physical process of annealing and the mathematical problem of obtaining a global minimum of a function allow us to finding the maximum likelihood point.
First we will define an energy function: \begin{equation} S(m) = - T_0 log\frac{\sigma_M(m)}{\rho_M(m)} \end{equation} where $m$ is refered to the coordinates in the space, $T_0$ is a fixed positive number termed the ambient temperature (e.g. $T=1$), $\sigma_M(m)$ is the designed function and $\rho_M(m)$ is a parameter. We obtain the probability density function as a function of the temperature of the system \begin{equation} \sigma_M(m,T)=\rho_M(m) exp \bigg( -\frac{S(m)}{T} \bigg) = \rho_M(m) exp \bigg( -\frac{-T_0 log\frac{\sigma_M(m)} {\rho_M(m)}}{T} \bigg) \end{equation}
# Import Libraries (PLEASE RUN THIS CODE FIRST!)
# ----------------------------------------------
import numpy as np
import matplotlib
## Show Plot in The Notebook
matplotlib.use('nbagg')
import matplotlib.pyplot as plt
from IPython import display
from mpl_toolkits.mplot3d import Axes3D
PARAMETER CONFIGURATION
The above expression helps us to understand the relationship between our pdf and the temperature. For this notebook is considered that $\rho_M(m) = 1$ and $T_0 = 1$, with the expresion beeing \begin{equation} \sigma_M(m,T)= exp \bigg( -\frac{-log(\sigma_M(m))}{T} \bigg) \end{equation}
# Parameter Configuration
# ----------------------
# Some global parameters
npp = 1000000 #number of tries to take
ns = 100 # number of samples to keep
T0 = 1.
T = T0
alpha = 0.99999 #cooling schedule
rho = 1 # manteining it constant
rho0 = 1 # mantening it constant
Tplots = 10 # inicial temperature for the plots
stepT = 4 #how many steps should the Temperature be *0.1 evolved
GENERATION OF THE PDF
Monte Carlo sampling of a probability density function, in this case for the purpose of the notebook it is used the absolute value of the function "peaks" from Matlab in each point. The corresponding section can simply be replaced with the probability from a proper inverse problem (e.g. hypocenter location). This function has the following expression
\begin{equation} \sigma_M(m) = \sigma(x,y) = | 3(1-x)^2 e^{-x^2-(y+1)^2}-10 \big( \frac{x}{5} -x^3-y^5\big) e^{-x^2-y^2} - \frac{1}{3} e^{-(x+1)^2-y^2} | \end{equation}# Generate a pdf
# the following steps generate a pdf; this is equivalent to the function "peaks(n)" in matlab
n = 100 # number of dimension
pdf = np.zeros([n,n])
sigma = np.zeros([n,n])
s = np.zeros([n,n])
x = -3.
for i in range(0,n):
y = -3.
for j in range(0,n):
pdf[j,i]=3.*(1-x)**2*np.exp(-(x**2)-(y+1)**2)-10.*(x/5-x**3-y**5)*np.exp(-x**2-y**2)-1./3*np.exp(-(x+1)**2-y**2)
if pdf[j,i] < 0:
pdf[j,i] = pdf[j,i] * (-1) # in contrast to the peaks function: all negative values are multiplied by (-1)
y = y + 6./(n-1)
x = x + 6./(n-1)
pdf = pdf / pdf.max()
s = -T0*np.log(pdf/rho0)
# Plot the 3D plot of pdf
# --------------------------
X = np.arange(0,100 + 100./(n-1), 100./(n-1))
Y = np.arange(0,100 + 100./(n-1), 100./(n-1))
fig0 = plt.figure()
ax = fig0.gca(projection='3d')
X, Y = np.meshgrid(X, Y)
surf = ax.plot_surface(Y, X, pdf,rstride=2, cstride=2, cmap=plt.cm.coolwarm, linewidth=0.1)
plt.gca().invert_xaxis()
plt.show()
The Energy function
It gives the distribution of energy in our pdf.
# Plot the 3D plot of Energy function
# --------------------------
X = np.arange(0,100 + 100./(n-1), 100./(n-1))
Y = np.arange(0,100 + 100./(n-1), 100./(n-1))
fig0 = plt.figure()
ax = fig0.gca(projection='3d')
X, Y = np.meshgrid(X, Y)
surf = ax.plot_surface(Y, X, s/s.max(),rstride=2, cstride=2, cmap=plt.cm.coolwarm, linewidth=0.1)
plt.gca().invert_xaxis()
plt.show()
THE HEAT BATH
For understanding how the temperature interacts with the probability distribution function, the notebook plots where $\sigma_M(m,T)$ is allowed to go in a given temperature. Each line represents an increase of factor of 10 of the temperature, starting with a hihg temperature of $T=10$, and finishing with $T=0.01$. As simulated annealing we Finally the ones in the middle are the "Heat bath graphs", in which varies their values in the space vary with each temperature.
#The heat bath
a = 1
fig0 = plt.figure(figsize=(10,8))
for i in range(0,stepT):
sigma = rho * np.exp(-(s)/Tplots)
sigma= sigma / sigma.max()
if Tplots < 1:
temp='$T = $'+str(Tplots)+'\n'
else:
temp= '$T = $'+str(int(Tplots))+'\n'
Tplots = Tplots * 0.1
b = 0
X = np.arange(0,100 + 100./(n-1), 100./(n-1))
Y = np.arange(0,100 + 100./(n-1), 100./(n-1))
ax = fig0.add_subplot(2, 2, a, projection='3d')
X, Y = np.meshgrid(X, Y)
ax.set_title(temp)
ax.plot_surface(Y, X, sigma,rstride=2, cstride=2, cmap=plt.cm.coolwarm, linewidth=0, antialiased=False)
plt.gca().invert_xaxis()
a += 1
plt.subplots_adjust(top=0.92, bottom=0.08, left=0.10, right=0.90, hspace=0.5, wspace=0.15)
plt.show()
SIMULATED ANNEALING: The algorithm
Now that we saw how the temperature affects to the pdf we can develop the algorithm. The therminology will be the folowing:
– $X$ is the random vector, or the configuration.
– $S_m$ is the system energy, the function value.
– $T$ is the system temperature.
– $\Delta S$ is the difference in system energy between two steps ($\Delta S= S_m(X_{i+1}) - S_m (X_i)$).
At each iteration on the temperature the program creates a $X_{i}$ and evaluates the point using the simulation model (pdf), finding an associated system energy. This move has to be accepted or not with respect to the previous one.
beeing $\alpha$ the cooling schedule.
At the end, the system will have reduced the temperature, according to the cooling choosen, towards 0.
# Simulated annealing and ploting the steps
# find an initial vector x
xcur = np.array([np.floor(np.random.uniform(0, 100)), np.floor(np.random.uniform(0, 100))])
# Let s get moving
iis = 0
npp = 0
xa = np.array([])
xa = np.hstack((xa, np.array([xcur[0],xcur[1]])))
## Ploting the steps (initializing the graph)
fig = plt.figure()
f1 = fig.add_subplot(111)
f1.imshow(pdf.transpose(),aspect='auto',interpolation='none', animated=True)
f1.set_xlim(0,99)
f1.set_ylim(99,0)
f1.set_title('Simulated annealing steps')
plt.ion() #set interactive mode
plt.show()
Pa = np.zeros(ns+1)
xnew = np.array([0.,0.])
Pa = [0]
while iis < ns:
npp = npp+1
#make a random choice for the next move
xnew = np.array([np.floor(np.random.uniform(0, 100)), np.floor(np.random.uniform(0, 100))])
#compare probabilities
Pcur = pdf[int(xcur[0]), int(xcur[1])]
Pnew = pdf[int(xnew[0]), int(xnew[1])]
T = alpha * T
if Pnew >= Pcur:
xcur = xnew
# print ' Made the %i-th move to [%i,%i] ' %(iis+1, xcur[0], xcur[1])
# display move graphically
f1.plot(xnew[0],xnew[1],'w+') #plot the steps with black +
plt.gcf().canvas.draw() #show the update
iis = iis + 1
if Pnew < Pcur:
deltaP = Pcur - Pnew
P = np.exp(-1. * deltaP / T)
test = np.random.uniform(0,1)
if test <= P:
xcur = xnew
#print ' Made the %i-th move to [%i,%i] P= %g ' %(iis+1,xcur[0],xcur[1],P)
# display move graphically
f1.plot(xnew[0],xnew[1],'w+') # plot the steps with black +
plt.gcf().canvas.draw() #show the update
iis = iis + 1
xa = np.vstack((xa, np.array([xcur[0],xcur[1]])))
Pa.append(Pnew)
plt.gcf().canvas.draw()
npp = npp + 1
Pa = np.array(Pa)
FINAL SAMPLING
To show how the program has proceded it is shown all the places where the program has gone in the sampling space, and the the maximum probability point found. Also, it is ploted as function of the iterations the best model and the current model.
# Final sampling
#get max of Pa and the index for the best model
y = Pa.max()
for i in range(0,npp):
if Pa[i] == y:
maxi = i
#Ploting the results
fig2 = plt.figure(figsize=(10,5))
#First graph
f1 = fig2.add_subplot(121)
f1.imshow(pdf.transpose(),aspect='auto', extent=[0,100,100,0],interpolation='none')
f1.plot(xa[:,0],xa[:,1],'w+') #Plot the steps with white +
f1.plot(xa[maxi,0],xa[maxi,1],'wo') #Maximum value acchieved ploted wiht the white cirlce
f1.set_ylabel('y')
f1.set_xlabel('x')
#Second graph
f2 = fig2.add_subplot(122)
# Show as a function of the iterations
# 1. the best model
# 2. the current model
best = np.zeros(npp)
for i in range(0,npp):
a = Pa[0]
for j in range(0,i+1):
if Pa[j] > a:
a = Pa[j]
best[i] = a
f2.plot(range(0,npp),best,'r-', label='Best model')
f2.plot(range(0,npp),Pa,'b-', label='Current model')
f2.set_xlabel('# of points visited')
f2.yaxis.set_label_position("right")
f2.legend(loc='upper center', bbox_to_anchor=(0.6, 1.1), ncol=2, handletextpad=0.1)
f2.set_ylabel('Probability')
plt.show()