Recently I've been working on a variety of python projects that need to run quickly. These projects need to be in python because my coworkers and the end users are all familiar with python (sorry, can't use Julia)....but python code isn't exactly the fastest in the land!
There are many ways to speed up python code. I usually start by looking at the math behind the model and trying to find nice approxiamtions or short cuts to reduce run time. For example, solving systems of equations rather than computing matix inversions for models like Gaussian processes and Ordinary Least Squares.
Next, I try to make sure that my code is as efficient as possible. For example, it is often faster to instantiate an array and write results of a function to the indicies of the array than to append to a list object. However, if everything is vectorized and running as quickly as possible I'm usually left with only two options (which aren't mutually exclusive).
Option One: multiprocessing! If I can distribute the computational load across workers, then I can potentially speed up the run time quite considerably. In a previous blog post I discussed how to do multiprocessing in python.
Option Two: use C extensions (or Java or C++ extensions)! The great thing about python is that, in many cases, you can use python as a nice wrapper around other languages. That way I can have C code that runs as an extension of python. My end users still get to interact with the code in python, but I get many of the advantages of C when designing the program.
In this post we will discuss the BASICS of creating C extensions in python. I will note, I am not an expert in C. C is a triacky language and if used improperly can result in big problems. For example, C is a low enough level language that you get nuanced control of the memory allocated to your program. This can let malicious suers take anvantage of your code to do bad things if you don't secure your code/server/application properly. So C is great and all but be careful.
If you get bored of wading through C code...you can skip to the very last section for a super easy method of compiling python code down to C. However, the easy method is by far the least flexible!
We'll start with the msot simple code I can think of: Hello World. I hate it when tutorials START with complex examples! It's exponentially harder to learn two things simulatenously than one thing at a time. We'll get to a more compelx example in a bit, so just bear with me. A big thank's to Adam Lamers whose 2015 blog inpired this post.
Author's Note: Why is it bear with me and not bare with me? Are bears especially patient creatures? I will need to look this up immediately after writing this post.
Let's start with our C code. I'll show the entire script and then break it down line by line. If you want to follow along with this blog post, save this code in a file named hello_module.c
.
#include <Python.h>
#include <stdio.h>
static PyObject* hello_module_print_hello_world(PyObject *self, PyObject *args)
{
printf("Hello World\n");
Py_RETURN_NONE;
}
static PyMethodDef hello_module_methods[] = {
{
"print_hello_world",
hello_module_print_hello_world,
METH_NOARGS,
"Print 'hello world' from a method defined in a C extension."
},
{NULL, NULL, 0, NULL}
};
static struct PyModuleDef hello_module_definition = {
PyModuleDef_HEAD_INIT,
"hello_module",
"A Python module that prints 'hello world' from C code.",
-1,
hello_module_methods
};
PyMODINIT_FUNC PyInit_hello_module(void)
{
Py_Initialize();
return PyModule_Create(&hello_module_definition);
}
Ok, let's actually walk through this code. Luckily it's pretty simple, even if you aren't very well acquainted with C.
We'll start with the hearers. Header files (with extension .h
) contain fucntion declarations and macro definitions that you want to share between different scripts. here we include two different header files. The first is provided by python to allow us to read in python datatypes and then output python datatypes. The second header contains a bunch of useful functions like printf()
for us to use in our C code.
#include <Python.h>
#include <stdio.h>
Nnext we have the actual module method definition. This is the code that will be called by our C extension. The code is reasonably self explanatory. Of note, you'll see that I'm returning Py_RETURN_NONE
. Python fucntions by default return python None. We need to have our C extension do the same thing. The naming convention for the method might look odd, but it is a convetion for C extensions in pyhon. THe name is the Module_Function name. THe module is hello_module
and the fucntion is print_hello_world
.
static PyObject* hello_module_print_hello_world(PyObject *self, PyObject *args)
{
printf("Hello World\n");
Py_RETURN_NONE;
}
We now move on to the method definition object for the extension. This will take three arguments:
ml_name: The method name
ml_meth: Fucntion pointer to method implementation
ml_flag: A special flag that indicates special features of the method. For example, accepting arguments, accepting keyword arguments, being a class method, or being a static method of a class.
ml_doc: The method's docstring.
If we had multiple fucntions in our method we would list them all out in this PyMethodDef object.
static PyMethodDef hello_module_methods[] = {
{
"print_hello_world",
hello_module_print_hello_world,
METH_NOARGS,
"Print 'hello world' from a method defined in a C extension."
},
{NULL, NULL, 0, NULL}
};
Getting close to the end! We now have the module definition. This will let python know what to call your module, where its methods are located, and where the method definitions are.
static struct PyModuleDef hello_module_definition = {
PyModuleDef_HEAD_INIT,
"hello_module",
"A Python module that prints 'hello world' from C code.",
-1,
hello_module_methods
};
Finally, we initialize the module. THis is the function called by ptyhon when importing your extension! We always name this function PyInit_[module_name_here]
. We'll reference this name again in the setup file of our module (coming up next!).
PyMODINIT_FUNC PyInit_hello_module(void)
{
Py_Initialize();
return PyModule_Create(&hello_module_definition);
}
We should all now understand the C code. To actually use this extension we need to set it up in python as a module. To do this we will use a setup file. If you are following along, pelase copy the code below into a file named setup.py
in the same directory as hello_module.c
.
from distutils.core import setup, Extension
hello_world_module = Extension('hello_module',
sources = ['hello_module.c'])
setup(name = 'simple_c_extension',
version = '0.1',
description = 'A useful description',
ext_modules = [hello_world_module],
url='your_website_or_github',
author='your_name',
author_email='your_email')
This code is pretty simple. We begin by importing the packages for setup and creating python extensions.
from distutils.core import setup, Extension
We then define an extension with a name (should match naming convetion of PyMODINIT_FUNC
in your C code) and the C file for the extension.
hello_world_module = Extension('hello_module',
sources = ['hello_module.c'])
Finally we define the set up operations. Most of this is just providing authorship and contact information for the module.
setup(name = 'simple_c_extension',
version = '0.1',
description = 'A useful description',
ext_modules = [hello_world_module],
url='your_website_or_github',
author='your_name',
author_email='your_email')
We can now pip install the module from the setup file and try running it! Run the code below in the same directory as the setup.py
file.
pip install .
You should see the following on the terminal output:
Successfully installed simple-c-extension-0.1
Let's run our code!
import hello_module
print( hello_module.print_hello_world() )
Wait...I don't see hellow world! I only see None. Well, remember, our C code prints "Hello World" using printf()
and returns a python None type. If you run this code in a jupyter notebook you will only see the None type obejct returned. If you run this code in the terminal you should see "Hello World" printed out as expected. This highlights one of many little problems that can creep up if you don't think hard about how your C code interacts with your python code.
Ok, so now you are wondering, "but what if I wnat to add arguments? and math? and USEFUL stuff?". Ok, well here we go. Let's make something that takes ina rguments, does some math, and returns soemthing useful. I'll be building off of Dan Foreman-Mackey's example of writing a simplke chi squared formula in C. Basically, our fucntion will evaluate this:
$$ \mathcal{X}^2(m,b)=\sum_{n=1}^N \frac{[y_n - (mx_n+b)]^2}{\sigma_n^2} $$We'll add this to our hello_module so that you can see what it looks like to ahve multiple fucntions in our C extension module.
Ok, so here is the entire C code. I've simply modified the original hello_module.c
file we were originally working with.
#include <Python.h>
#include <numpy/arrayobject.h>
#include <stdio.h>
#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION
double chi2(double m, double b, double *x, double *y, double *yerr, int N) {
int n;
double result = 0.0, diff;
for (n = 0; n < N; n++) {
diff = (y[n] - (m * x[n] + b)) / yerr[n];
result += diff * diff;
}
return result;
}
static PyObject* hello_module_print_hello_world(PyObject *self, PyObject *args)
{
printf("Hello World\n");
Py_RETURN_NONE;
}
static PyObject* hello_module_chi2(PyObject *self, PyObject *args)
{
double m, b;
PyObject *x_obj, *y_obj, *yerr_obj;
if (!PyArg_ParseTuple(args, "ddOOO", &m, &b, &x_obj, &y_obj,
&yerr_obj))
return NULL;
PyObject *x_array = PyArray_FROM_OTF(x_obj, NPY_DOUBLE, NPY_IN_ARRAY);
PyObject *y_array = PyArray_FROM_OTF(y_obj, NPY_DOUBLE, NPY_IN_ARRAY);
PyObject *yerr_array = PyArray_FROM_OTF(yerr_obj, NPY_DOUBLE,
NPY_IN_ARRAY);
if (x_array == NULL || y_array == NULL || yerr_array == NULL) {
Py_XDECREF(x_array);
Py_XDECREF(y_array);
Py_XDECREF(yerr_array);
return NULL;
}
int N = (int)PyArray_DIM(x_array, 0);
double *x = (double*)PyArray_DATA(x_array);
double *y = (double*)PyArray_DATA(y_array);
double *yerr = (double*)PyArray_DATA(yerr_array);
double value = chi2(m, b, x, y, yerr, N);
Py_DECREF(x_array);
Py_DECREF(y_array);
Py_DECREF(yerr_array);
if (value < 0.0) {
PyErr_SetString(PyExc_RuntimeError,
"Chi-squared returned an impossible value.");
return NULL;
}
PyObject *ret = Py_BuildValue("d", value);
return ret;
}
static PyMethodDef hello_module_methods[] = {
{
"print_hello_world",
hello_module_print_hello_world,
METH_NOARGS,
"Print 'hello world' from a method defined in a C extension."
},
{
"chi2",
hello_module_chi2,
METH_VARARGS,
"Calculate the chi-squared of some data given a model."
},
{NULL, NULL, 0, NULL}
};
static struct PyModuleDef hello_module_definition = {
PyModuleDef_HEAD_INIT,
"hello_module",
"A Python module that prints 'hello world' from C code.",
-1,
hello_module_methods
};
PyMODINIT_FUNC PyInit_hello_module(void)
{
Py_Initialize();
import_array();
return PyModule_Create(&hello_module_definition);
}
Ok, let's walk through this code and see what's changed (and why!).
First, you'll notice that we have more header files. Since we want to interact with numpy arrays, we need to call in the some fucntionality from numpy. We also add a definition to make sure we are using the latest numpy api.
#include <Python.h>
#include <numpy/arrayobject.h>
#include <stdio.h>
#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION
We then define the method that will actually calcualte the chi squared coefficient. This jsut exectutes the math from before:
$$ \mathcal{X}^2(m,b)=\sum_{n=1}^N \frac{[y_n - (mx_n+b)]^2}{\sigma_n^2} $$double chi2(double m, double b, double *x, double *y, double *yerr, int N) {
int n;
double result = 0.0, diff;
for (n = 0; n < N; n++) {
diff = (y[n] - (m * x[n] + b)) / yerr[n];
result += diff * diff;
}
return result;
}
Our original print_hello_world module stays the same!
static PyObject* hello_module_print_hello_world(PyObject *self, PyObject *args)
{
printf("Hello World\n");
Py_RETURN_NONE;
}
We then have the chi2 pymodule that wraps up our chi squared method in C. THis jsut handles reading in the numpy arrays, feeding them to chi2 (which we defiend above), and the . returning a result in soemthing python can understand.
static PyObject* hello_module_chi2(PyObject *self, PyObject *args)
{
// instantiate C objects with the object types we need
double m, b;
PyObject *x_obj, *y_obj, *yerr_obj;
// parse the inputs
// return an error if inputs don't fit expected types
// expected types are "ddOOO" which means
// Double, Double, Object, Obeject, Object
if (!PyArg_ParseTuple(args, "ddOOO", &m, &b, &x_obj, &y_obj, &yerr_obj))
return NULL;
// turn the pyobjects into arrays that we can work with
PyObject *x_array = PyArray_FROM_OTF(x_obj, NPY_DOUBLE, NPY_IN_ARRAY);
PyObject *y_array = PyArray_FROM_OTF(y_obj, NPY_DOUBLE, NPY_IN_ARRAY);
PyObject *yerr_array = PyArray_FROM_OTF(yerr_obj, NPY_DOUBLE, NPY_IN_ARRAY);
// if soemthing went wrong converting the objects to arrays
// we throw an exception here
if (x_array == NULL || y_array == NULL || yerr_array == NULL) {
Py_XDECREF(x_array);
Py_XDECREF(y_array);
Py_XDECREF(yerr_array);
return NULL;
}
// count the number of data points in x array
int N = (int)PyArray_DIM(x_array, 0);
// get C-type pointers to data
double *x = (double*)PyArray_DATA(x_array);
double *y = (double*)PyArray_DATA(y_array);
double *yerr = (double*)PyArray_DATA(yerr_array);
// call the C method we defined earlier for
// chi squared
double value = chi2(m, b, x, y, yerr, N);
// clean up the arrays
Py_DECREF(x_array);
Py_DECREF(y_array);
Py_DECREF(yerr_array);
if (value < 0.0) {
PyErr_SetString(PyExc_RuntimeError,
"Chi-squared returned an impossible value.");
return NULL;
}
// create an output that python can handle
PyObject *ret = Py_BuildValue("d", value);
return ret;
}
We now add the hello_module_chi2
method to our array of module methods. Our module can now do two things! Whooo!
static PyMethodDef hello_module_methods[] = {
{
"print_hello_world",
hello_module_print_hello_world,
METH_NOARGS,
"Print 'hello world' from a method defined in a C extension."
},
{
"chi2",
hello_module_chi2,
METH_VARARGS,
"Calculate the chi-squared of some data given a model."
},
{NULL, NULL, 0, NULL}
};
The module definition doesn't change. The reason is because we neatly compartmentalized the array hello_module_methods
and the module method hello_module_definition
. When we want to add methods to our module we just need to add them to our array hello_module_methods
.
static struct PyModuleDef hello_module_definition = {
PyModuleDef_HEAD_INIT,
"hello_module",
"A Python module that prints 'hello world' from C code.",
-1,
hello_module_methods
};
The module initialization is the same EXCEPT that we add import_array()
. This let's python know that we need arrays in our code. If you don't do this the setup.py
will still compile the C code without errors. You'll still be able to import the module jsut fine. But when you actually call the chi2 fucntion your python kernel will be killed because of segmentation fault 11
. It's horrible to debug because unless you have carefully read the python C extension documentation, there are no bread crumbs to follow to figure out what caused that error! I, to my chagrin, hit my head against this error for a long time before I actually read through the documentation and found a note discussing exactly this issue.
PyMODINIT_FUNC PyInit_hello_module(void)
{
Py_Initialize();
import_array();
return PyModule_Create(&hello_module_definition);
}
Ok, you'll need to make one last change to the setup.py
file before building. We add the include_dirs
argument to let python know where the numpy array header file is.
from distutils.core import setup, Extension
import numpy
hello_world_module = Extension('hello_module',
sources = ['hello_module.c'],
include_dirs=[numpy.get_include()])
setup(name = 'simple_c_extension',
version = '0.1',
description = 'A useful description',
ext_modules = [hello_world_module],
url='your_website_or_github',
author='your_name',
author_email='your_email')
Ok, let's run setup.py
like before and see if our code works!
import hello_module
print( hello_module.print_hello_world() )
hello_module.chi2(2.0, 1.0, [-1.0, 4.2, 30.6],
[-1.5, 8.0, 63.0],
[1.0, 1.5, 0.6])
Nice! It works. Awesome, so that should be enough to get you started writing C extensions for python!
Ok, maybe all of that was too much work for you and you want an easy way out. Well, you could use Numba. Numba is an awesome package for compiling down python code into C code (where possible). We can compile using Just in Time (JIT) compilation or Ahead of Time (AOT) comilation.
JIT will compile when you first use the function that you have added the numba jit decorator too. I've taken this nice example from the numba documentation. We'll start with a standard bubble sort with no bells or whistles.
import numpy as np
import time
# example fucntion
def bubblesort(X):
N = len(X)
for end in range(N, 1, -1):
for i in range(end - 1):
cur = X[i]
if cur > X[i + 1]:
tmp = X[i]
X[i] = X[i + 1]
X[i + 1] = tmp
# create arrays to be sorted
original = np.arange(0.0, 10.0, 0.01, dtype='f4')
shuffled = original.copy()
np.random.shuffle(shuffled)
Let's time the bubble sort with no optimization. Just plain old python code.
to_sort = shuffled.copy()
start = time.time()
bubblesort(to_sort)
end = time.time()
print("{} seconds".format(end-start))
np.array_equal(to_sort, original)
Ok, let's try this once again, but we'll let Numba compile the function into a binary using it's magic powers.
from numba import jit
# numba JIT optimized
@jit(target='cpu', nopython=True)
def bubblesort_jit(X):
N = len(X)
for end in range(N, 1, -1):
for i in range(end - 1):
cur = X[i]
if cur > X[i + 1]:
tmp = X[i]
X[i] = X[i + 1]
X[i + 1] = tmp
to_sort = shuffled.copy()
start = time.time()
bubblesort_jit(to_sort)
end = time.time()
print("{} seconds".format(end-start))
np.array_equal(to_sort, original)
WAIT! That's not that much fater. Who am I kidding? Well, remember that JIT compiles the code during the first run. That means that it compiled AND ran in 0.16 seconds. IF we run the funtion again (now that it is compiled), we'll get a good understanding of the real run time.
to_sort = shuffled.copy()
start = time.time()
bubblesort_jit(to_sort)
end = time.time()
print("{} seconds".format(end-start))
np.array_equal(to_sort, original)
Ok, that looks a lot better. Running this code after optimizing using numba helped a lot! Even better, we can easily incorporate this code with the rest of our python code. No need to have C scripts in our repository or extensions added to our setup.py
file.
But what if we don't want to compile on the first run? What it JIT isn't good enough? Well, Numba supports AOT.
Let's define a simple module to do multiplication and squares for us (again, I stole this example from the Numba documentation).
First, let's create a python file called custom_math.py
. Copy and apste the code below into the file.
from numba.pycc import CC
cc = CC('my_module')
# Uncomment the following line to print out the compilation steps
#cc.verbose = True
@cc.export('multf', 'f8(f8, f8)')
@cc.export('multi', 'i4(i4, i4)')
def mult(a, b):
return a * b
@cc.export('square', 'f8(f8)')
def square(a):
return a ** 2
We'll also define a setup.py
file to create our new module.
from distutils.core import setup
from custom_math import cc
setup(name = 'numba_example',
version = '0.2',
description = 'A useful description',
ext_modules=[cc.distutils_extension()],
url='your_website_or_github',
author='your_name',
author_email='your_email')
Run the setup.py
file as before using pip install .
. Let's see if our module works as expected.
import my_module
my_module.multi(3, 4)
my_module.square(1.414)
Cool! Ok, well, that was amazingly easy, right?!
Well now you know exactly how to create C extensions for python. When needed, you can have your code run at insanely fast speeds. A word of caution, however. Just like multiprocessing, using C extensions should only be used when needed. It adds compelxity that can be ahrd to debug or create unintended errors/complications in the end user's environment. Additionally, some fucntions are already optimized to an incredible amount in python already (like dot products in numpy and scipy). You might not be able to improve on what has already been built.