Particle Swarm Optimization¶

As a machine learning engineer with an educational background in physics, I've always loved the intersection of physics and machine learning. Many of my collegues are physicists and many of the models and methods we use come from the field of physics.

One neat method I've always liked is particle swarm optimization (PSO). It is conceptually related to the MCMC methods Ensemble MCMC and Hamiltonian Monte Carlo. In this post I will walk through how PSO works and code an example of PSO from scratch in python. Here we explore the classic PSO algorithm, but there is a bayesian version used in the field of bayesian optimization that I encourage you to look up if you enjoy this post.

PSO was proposed in 1995 by Eberhart and Kennedy. It was originally intended for simulating social behavior using patterns inspired by the was birds flock and schools of fish swim.

PSO is a metaheuristic technicque that makes few to no assumptions about the function being optimized. It is very useful in cases where the search space is high dimensional, non-convex, and/or non-continuous. One great aspect of PSO is that it does not require the function being optimized to be differentiable as does gradient descent and quasi-newton methods.

The high level idea behind PSO is that we release many different "particles" into the search space. These aprticles move in such a way that they tend to congregate around minima in the optimization topography. Heuristically, you can think of dumping marbles into a funnels such that they all end up falling into the center no matter where they started from. We start by giving our particles the propeties of position (along the optimization plane) and velosity:

$$ x^i_{k+1} = x^i_k + v^i_{k+1} $$

$$ v^i_{k+1} = w_k v^i_k + c_1 r_1 (p^i_k - x^i_k) + c_2 r_2 (p^g_k - x^i_k) $$

Where $x$ represents the position on the optimization plane, $p$ represents the best historical position, $v$ represents velocity, the super script $i$ represents an individual particle, the subscript $k$ represents the time step, the super script $g$ represents a group term across particles, $w$ is an inertial weight constant, $c_1$ is a cognitive parameter, $c_2$ is a social parameter, and $r_1$ and $r_2$ are random numbers between 0 and 1.

We label the term $c_2 r_2 (p^g_k - x^i_k)$ the "social term" and $c_1 r_1 (p^i_k - x^i_k)$ the "cognitive term".

So what are these terms? Since each particle is "learning" about a different section of the optimization plane, we want the particles to share some information to alert each other when one has found a position that might be the global optima. This is what the social term does. On the other hand, each particle should have some ability to pursue promising elads on it's own; this is the cognitive term. These terms may agree or fight each other.

Our optimization algorithm follows a fairly simple method for updating the equations above. I will note, since it is a running joke with my friends, that engineers and mathmeticians may disagree with step 1-D of setting $k=1$ instead of $k=0$. To each their own!

Initialize Values
1. Set constants ($k_max, w_k, c_1, c_2$)
2. Randomly assign each particle a position
3. Randomly assign each particle a velocity
4. Set time setp to $k=1$ (we will run until we hit $k_max$)
Run Optimization
1. Evaluate cost function $f$ for each particle (denoted $f^i_k$ at position $x^i_k$)
2. If $f^i_k <= f^i_{best}$ then $f^i_best = f^i_k$ and $p^i_k = x^i_k$
3. If $f^i_k <= f^g_{best}$ then $f^g_best = f^i_k$ and $p^g_k = x^i_k$
4. Stop if any stopping conditions are met
5. Update particle velocities
6. Update particle positions
7. Increment $k += 1$
8. Go back to step 2-A

From the update equations and optimization algorithm, we can see that each particle is influenced by three forces:

Individual particle velocity (inertia)
Distance from individual historical best position (cognitive force)
Distance from swarm historical best position (social force)

We weight these forces using $w_k, c_1, c_2$ and then perturb the forces stochastically using $r_1$ and $r_2$. Changing the weighting parameters can make the optimization algorithm more or less greedy impacting run times/convergence times and the likelihood of ending up at the global vs. local minima.

We can show how each of these forces imapcts an individual particle using the vector diagram below:

Based on this diagram we can see that the particle has a lot of inertia and so will most likely explore the search space rapidly. You can see that in this case it ends up moving away from the swarm best position.

Below we show an example of a low energy/inertia partical that is heavily influenced by the swarm.

When we put everything together, the particles will behave somthing like what is show in the gif below.

Okay, let's dive into the code!

# imports needed for PSO class
from __future__ import division
import random
import math
import numpy as np

class Particle:
    """Class for Individual Particle"""

    def __init__(self, x_0, num_dimensions):
        self.num_dimensions = num_dimensions
        # particle position
        self.position_i = np.asarray(x_0)
        # particle velocity
        self.velocity_i = np.random.uniform(-1, 1, self.num_dimensions)
        # error individual
        self.cost_i = -1
        # best position individual
        self.pos_best_i = []
        # best error individual
        self.cost_best_i = -1

        #for i in range(0, num_dimensions):
        #    self.velocity_i.append(random.uniform(-1, 1))
        #    self.position_i.append(x_0[i])

    # evaluate current loss/cost/fitness
    def evaluate(self, cost_func):
        self.cost_i = cost_func(self.position_i)

        # check if current position is individual best position
        if self.cost_i < self.cost_best_i or self.cost_best_i == -1:
            self.pos_best_i = self.position_i
            self.cost_best_i = self.cost_i

    # update particle velocity
    def update_velocity(self,pos_best_g):

        # inertia weight, cognitive cosntant, social constant
        w, c1, c2 = 0.5, 1, 2

        for i in range(0, self.num_dimensions):
            r1 = np.random.uniform(0, 1, 1)[0]
            r2 = np.random.uniform(0, 1, 1)[0]

            vel_cognitive = c1 * r1 * (self.pos_best_i[i] - self.position_i[i])
            vel_social = c2 * r2 * (pos_best_g[i] - self.position_i[i])
            self.velocity_i[i] = w * self.velocity_i[i] + vel_cognitive + vel_social

    # update the particle position
    def update_position(self, bounds):

        for i in range(0, self.num_dimensions):

            self.position_i[i] = self.position_i[i] + self.velocity_i[i]

            # adjust maximum position
            if self.position_i[i] > bounds[i][1]:
                self.position_i[i] = bounds[i][1]

            # adjust minimum position
            if self.position_i[i] < bounds[i][0]:
                self.position_i[i] = bounds[i][0]

class PSO():
    """Particle Swarm Optimizer"""

    def __init__(self, cost_func, x_0, bounds, num_particles, max_iter, rdm_init = True):

        self.cost_func = cost_func
        self.num_dimensions = len(x_0)
        self.x_0 = x_0
        self.bounds = bounds
        self.num_particles = num_particles
        self.max_iter = max_iter
        self.rdm_init = rdm_init

    def __call__(self):

        # best error for group
        err_best_g=-1
        # best position for group
        pos_best_g=[]

        # establish the swarm
        swarm=[]
        for i in range(0, self.num_particles):
            # if random initializer is True
            # randomly initialize each particle
            if not self.rdm_init:
                swarm.append(Particle(self.x_0, self.num_dimensions))
            else:
                # TODO: change to sample in range of bounds
                # instead of with standard deviation 1 
                x_0 = np.random.multivariate_normal(self.x_0, np.eye(self.num_dimensions), 1)[0]
                swarm.append(Particle(x_0, self.num_dimensions))

        # begin optimization loop
        i=0
        while i < self.max_iter:
            #print i,err_best_g
            # cycle through particles in swarm and evaluate fitness
            for j in range(0, self.num_particles):
                swarm[j].evaluate(self.cost_func)

                # determine if current particle is the best (globally)
                if swarm[j].cost_i < err_best_g or err_best_g == -1:
                    pos_best_g=list(swarm[j].position_i)
                    err_best_g=float(swarm[j].cost_i)

            # cycle through swarm and update velocities and position
            for j in range(0, self.num_particles):
                swarm[j].update_velocity(pos_best_g)
                swarm[j].update_position(self.bounds)
            i += 1

        return {"Optimal Parameters": pos_best_g,"Optimal Loss": err_best_g}

To test out our code we'll run the optimizer on a simple 2D function with a known global minima.

# example cost function
# known global minima at x = (3.0, 3.0)
def cost_func(x):
    return np.sum((np.asarray(x)-3.0)**2)

# initial starting position
initial = np.random.uniform(0,8,2)      
# parameter/position bounds
bounds = [(-10,10),(-10,10)]  
# create optimizer object
optimizer = PSO(cost_func, initial, bounds, num_particles=15, max_iter=30)
# run optimizer
optimizer()

{'Optimal Parameters': [2.999983513358816, 2.999994903876817],
 'Optimal Loss': 2.9777980902004224e-10}

As expected, the optimizer quickly finds the global optima quite quickly!

Next Steps¶

Now that we have a good understanding of the basic PSO algorithm, you should try messing with the code above to get a feel for how to use PSO in more practical settings. Try the exercises below:

describe a more complex cost function and see how well PSO works
keep a history of each particle's position and plot them
add regularization to the particale movement updates
try using bayesian updates

And remember, while this method is informed by biology and physics, that doesn't mean we need to stick closely to biological or physical systems. For example, you could try annealing (increase and decrease over time) the inertial constant throughout the optimization process to reduce the chances of falling into local minima!