As a machine learning engineer with an educational background in physics, I've always loved the intersection of physics and machine learning. Many of my collegues are physicists and many of the models and methods we use come from the field of physics.
One neat method I've always liked is particle swarm optimization (PSO). It is conceptually related to the MCMC methods Ensemble MCMC and Hamiltonian Monte Carlo. In this post I will walk through how PSO works and code an example of PSO from scratch in python. Here we explore the classic PSO algorithm, but there is a bayesian version used in the field of bayesian optimization that I encourage you to look up if you enjoy this post.
PSO was proposed in 1995 by Eberhart and Kennedy. It was originally intended for simulating social behavior using patterns inspired by the was birds flock and schools of fish swim.
PSO is a metaheuristic technicque that makes few to no assumptions about the function being optimized. It is very useful in cases where the search space is high dimensional, non-convex, and/or non-continuous. One great aspect of PSO is that it does not require the function being optimized to be differentiable as does gradient descent and quasi-newton methods.
The high level idea behind PSO is that we release many different "particles" into the search space. These aprticles move in such a way that they tend to congregate around minima in the optimization topography. Heuristically, you can think of dumping marbles into a funnels such that they all end up falling into the center no matter where they started from. We start by giving our particles the propeties of position (along the optimization plane) and velosity:
$$ x^i_{k+1} = x^i_k + v^i_{k+1} $$
$$ v^i_{k+1} = w_k v^i_k + c_1 r_1 (p^i_k - x^i_k) + c_2 r_2 (p^g_k - x^i_k) $$
Where $x$ represents the position on the optimization plane, $p$ represents the best historical position, $v$ represents velocity, the super script $i$ represents an individual particle, the subscript $k$ represents the time step, the super script $g$ represents a group term across particles, $w$ is an inertial weight constant, $c_1$ is a cognitive parameter, $c_2$ is a social parameter, and $r_1$ and $r_2$ are random numbers between 0 and 1.
We label the term $c_2 r_2 (p^g_k - x^i_k)$ the "social term" and $c_1 r_1 (p^i_k - x^i_k)$ the "cognitive term".
So what are these terms? Since each particle is "learning" about a different section of the optimization plane, we want the particles to share some information to alert each other when one has found a position that might be the global optima. This is what the social term does. On the other hand, each particle should have some ability to pursue promising elads on it's own; this is the cognitive term. These terms may agree or fight each other.
Our optimization algorithm follows a fairly simple method for updating the equations above. I will note, since it is a running joke with my friends, that engineers and mathmeticians may disagree with step 1-D of setting $k=1$ instead of $k=0$. To each their own!
From the update equations and optimization algorithm, we can see that each particle is influenced by three forces:
We weight these forces using $w_k, c_1, c_2$ and then perturb the forces stochastically using $r_1$ and $r_2$. Changing the weighting parameters can make the optimization algorithm more or less greedy impacting run times/convergence times and the likelihood of ending up at the global vs. local minima.
We can show how each of these forces imapcts an individual particle using the vector diagram below:
Based on this diagram we can see that the particle has a lot of inertia and so will most likely explore the search space rapidly. You can see that in this case it ends up moving away from the swarm best position.
Below we show an example of a low energy/inertia partical that is heavily influenced by the swarm.
When we put everything together, the particles will behave somthing like what is show in the gif below.
Okay, let's dive into the code!
# imports needed for PSO class
from __future__ import division
import random
import math
import numpy as np
class Particle:
"""Class for Individual Particle"""
def __init__(self, x_0, num_dimensions):
self.num_dimensions = num_dimensions
# particle position
self.position_i = np.asarray(x_0)
# particle velocity
self.velocity_i = np.random.uniform(-1, 1, self.num_dimensions)
# error individual
self.cost_i = -1
# best position individual
self.pos_best_i = []
# best error individual
self.cost_best_i = -1
#for i in range(0, num_dimensions):
# self.velocity_i.append(random.uniform(-1, 1))
# self.position_i.append(x_0[i])
# evaluate current loss/cost/fitness
def evaluate(self, cost_func):
self.cost_i = cost_func(self.position_i)
# check if current position is individual best position
if self.cost_i < self.cost_best_i or self.cost_best_i == -1:
self.pos_best_i = self.position_i
self.cost_best_i = self.cost_i
# update particle velocity
def update_velocity(self,pos_best_g):
# inertia weight, cognitive cosntant, social constant
w, c1, c2 = 0.5, 1, 2
for i in range(0, self.num_dimensions):
r1 = np.random.uniform(0, 1, 1)[0]
r2 = np.random.uniform(0, 1, 1)[0]
vel_cognitive = c1 * r1 * (self.pos_best_i[i] - self.position_i[i])
vel_social = c2 * r2 * (pos_best_g[i] - self.position_i[i])
self.velocity_i[i] = w * self.velocity_i[i] + vel_cognitive + vel_social
# update the particle position
def update_position(self, bounds):
for i in range(0, self.num_dimensions):
self.position_i[i] = self.position_i[i] + self.velocity_i[i]
# adjust maximum position
if self.position_i[i] > bounds[i][1]:
self.position_i[i] = bounds[i][1]
# adjust minimum position
if self.position_i[i] < bounds[i][0]:
self.position_i[i] = bounds[i][0]
class PSO():
"""Particle Swarm Optimizer"""
def __init__(self, cost_func, x_0, bounds, num_particles, max_iter, rdm_init = True):
self.cost_func = cost_func
self.num_dimensions = len(x_0)
self.x_0 = x_0
self.bounds = bounds
self.num_particles = num_particles
self.max_iter = max_iter
self.rdm_init = rdm_init
def __call__(self):
# best error for group
err_best_g=-1
# best position for group
pos_best_g=[]
# establish the swarm
swarm=[]
for i in range(0, self.num_particles):
# if random initializer is True
# randomly initialize each particle
if not self.rdm_init:
swarm.append(Particle(self.x_0, self.num_dimensions))
else:
# TODO: change to sample in range of bounds
# instead of with standard deviation 1
x_0 = np.random.multivariate_normal(self.x_0, np.eye(self.num_dimensions), 1)[0]
swarm.append(Particle(x_0, self.num_dimensions))
# begin optimization loop
i=0
while i < self.max_iter:
#print i,err_best_g
# cycle through particles in swarm and evaluate fitness
for j in range(0, self.num_particles):
swarm[j].evaluate(self.cost_func)
# determine if current particle is the best (globally)
if swarm[j].cost_i < err_best_g or err_best_g == -1:
pos_best_g=list(swarm[j].position_i)
err_best_g=float(swarm[j].cost_i)
# cycle through swarm and update velocities and position
for j in range(0, self.num_particles):
swarm[j].update_velocity(pos_best_g)
swarm[j].update_position(self.bounds)
i += 1
return {"Optimal Parameters": pos_best_g,"Optimal Loss": err_best_g}
To test out our code we'll run the optimizer on a simple 2D function with a known global minima.
# example cost function
# known global minima at x = (3.0, 3.0)
def cost_func(x):
return np.sum((np.asarray(x)-3.0)**2)
# initial starting position
initial = np.random.uniform(0,8,2)
# parameter/position bounds
bounds = [(-10,10),(-10,10)]
# create optimizer object
optimizer = PSO(cost_func, initial, bounds, num_particles=15, max_iter=30)
# run optimizer
optimizer()
As expected, the optimizer quickly finds the global optima quite quickly!
Now that we have a good understanding of the basic PSO algorithm, you should try messing with the code above to get a feel for how to use PSO in more practical settings. Try the exercises below:
And remember, while this method is informed by biology and physics, that doesn't mean we need to stick closely to biological or physical systems. For example, you could try annealing (increase and decrease over time) the inertial constant throughout the optimization process to reduce the chances of falling into local minima!