Start building your own chatbot now!

If you’re in the field of science or machine learning, you’ve probably used Python before, and you’ve probably noticed how incredibly slow it is!

To solve that issue, tools like Numpy are very useful to speed up calculations by having C in the background. However, by using C, you’re limited to number operations only.

At SAP Conversational AI, we use C to reduce calculation times and memory usage for the parts of our code treat a lot of operations.

In this article, I’m going to show you how to implement a Python code example with a call to C functions via Ctypes, Cython and CFFI. Then we’ll compare them in terms of time and memory used. Let’s roll!

## Understanding the C functions you need

To benchmark different languages, let’s use some basic C functions to simulate calculations and exclude performance. In this test, I chose to set up nested structures, numbers and a string to see different types of data and how to manage them in each implementation.

```typedef struct s_point
{
double x;
double y;
} t_point;

typedef struct s_test
{
char *sentence;
int nb_points;
t_point *points;
double *distances;
} t_test;
```

This function increase the value of each character by the value n:

```char *increment_string(char *str, int n)
{
for (int i = 0; str[i]; i++)
str[i] = str[i] + n;
return (str);
} ```

This function generate nb points with random coordinates:

```void generate_points(t_test *test, int nb)
{
t_point *points = calloc(nb + 1, sizeof(t_point));

for (int i = 0; i < nb; i++)
{
points[i].x = rand();
points[i].y = rand();
}
test->points = points;
test->nb_points = nb;
} ```

This function calculate the distance between all points of the list for each points:

```void distance_between_points(t_test *test)
{
int nb = test->nb_points;
double *distances = calloc(nb * nb + 1, sizeof(double));

for (int i = 0; i < nb; i++)
for (int j = 0; j < nb; j++)
distances[i * nb + j] = sqrt((test->points[j].x - test->points[i].x) * (test->points[j].x - test->points[i].x) + (test->points[j].y - test->points[i].y) * (test->points[j].y - test->points[i].y));
test->distances = distances;
}```

## Example code in Python without C

Let’s now implement the test in Python to have a base of reference of computation speed and memory used without the functions in C.

```import random

class Point():
def __init__(self, x, y):
self.x = x
self.y = y

class Test():
def __init__(self, string, nb):
self.string = string
self.points = []
for i in range(nb):
self.points.append(Point(random.random(), random.random()))
self.distances = []

def increment_string(self, n):
tmp = ""
for c in self.string:
tmp += chr(ord(c) + n)
self.string = tmp

def distance_between_points(self):
for i, a in enumerate(self.points):
for b in self.points:
self.distances.append(((b.x - a.x) ** 2 + (b.y - b.x) ** 2) ** 0.5)

if __name__ == '__main__':
test = Test("A nice sentence to test.", 10000)
test.increment_string(-5)
test.distance_between_points()```

## Ctypes implementation

Now that we have a reference, let’s understand how we can do the implementation in Ctypes. Ctypes is relatively simple to handle. The CDLL feature for importing libraries is very handy and being able to transform a Dict into a C structure helps a lot. On the other hand, the variables coming from Python require a conversion to go to C, which can take time.

```import ctypes
from ctypes import *
from ctypes.util import find_library

# Ctypes structures
class Point(ctypes.Structure):
_fields_ = [('x', ctypes.c_double), ('y', ctypes.c_double)]

class Test(ctypes.Structure):
_fields_ = [
('sentence', ctypes.c_char_p),
('nb_points', ctypes.c_int),
('points', ctypes.POINTER(Point)),
('distances', ctypes.POINTER(c_double)),
]

# Lib C functions
_libc = ctypes.CDLL(find_library('c'))
_libc.free.argtypes = [ctypes.c_void_p]
_libc.free.restype = ctypes.c_void_p

# Lib shared functions
_libblog = ctypes.CDLL("./libblog.so")
_libblog.increment_string.argtypes = [ctypes.c_char_p, ctypes.c_int]
_libblog.increment_string.restype = ctypes.c_char_p
_libblog.generate_points.argtypes = [ctypes.POINTER(Test), ctypes.c_int]
_libblog.distance_between_points.argtypes = [ctypes.POINTER(Test)]

if __name__ == '__main__':
# Create the dict for generate the ctypes structure
test = {}
test['sentence'] = "A nice sentence to test.".encode('utf-8')
test['nb_points'] = 0
test['points'] = None
test['distances'] = None
c_test = Test(**test)
ptr_test = ctypes.pointer(c_test)

# Call C functions
_libblog.generate_points(ptr_test, 10000)
ptr_test.contents.sentence = _libblog.increment_string(ptr_test.contents.sentence, -5)
_libblog.distance_between_points(ptr_test)
_libc.free(ptr_test.contents.points)
_libc.free(ptr_test.contents.distances)```

### Cython implementation

Let’s talk Cython now. Cython lets you mix C and Python within a single file, so you can use C while still being in a Python file. This makes the use of C possible for people who don’t know it, because the C written in a Cython file uses Python syntax. Only the variables are in C. However, Cython has a few downsides, mainly that it requires more files. You need a setup.py file to cythonize the .pyx files and generate them in C, you need at least one .py file to call the Cython .pyx files in C after cythonization, and the declarations can be in .pxd files. Nevertheless, this is how you should go about it:

File setup.py:

```from distutils.core import setup
from Cython.Build import cythonize

setup(
name = 'Test Cython',
ext_modules = cythonize("test_cython.pyx"),
)```

File test_cython.py:

```from test_cython import test

if __name__ == '__main__':
test()```

File test_cython.pyx:

```import cython
import random

from libc.stdlib cimport calloc, free
from libc.math cimport sqrt

# Import C structures and functions from the C header
cdef extern from "libblog.h":
ctypedef struct t_point:
double  x
double  y

ctypedef struct t_test:
char    *sentence
int    nb_points
t_point *points
double  *distances

# C functions written in python syntax
cdef char *increment_string(char *str, int n):
cdef int i = 0

while str[i]:
str[i] = str[i] + n
i += 1
return str

cdef void generate_points(t_test *test, int nb):
cdef t_point *points = <t_point*>calloc(nb + 1, sizeof(t_point))

for i in range(nb):
points[i].x = random.random()
points[i].y = random.random()
test.points = points
test.nb_points = nb

cdef void distance_between_points(t_test *test):
cdef int nb = test.nb_points
cdef double *distances = <double*>calloc(nb * nb + 1, sizeof(double))
cdef int i
cdef int j

for i from 0 <= i < nb:
for j from 0 <= j < nb:
distances[i * nb + j] = sqrt((test.points[j].x - test.points[i].x) * (test.points[j].x - test.points[i].x) + (test.points[j].y - test.points[i].y) * (test.points[j].y - test.points[i].y))
test.distances = distances

def test():
# Declare the structure and set the values
cdef t_test test

py_sentence = "A nice sentence to test.".encode('utf-8')
test.sentence = py_sentence
test.nb_points = 0
test.points = NULL
test.distances = NULL

# Call C functions written in python
generate_points(&test, 10000)
test.sentence = increment_string(test.sentence, 1)
distance_between_points(&test)

# Call C function free
free(test.points)
free(test.distances)```

### CFFI implementation

Let’s now try to do the same with CFFI. The CFFI module is easy to handle, because it doesn’t require a pseudo-language to bridge C and Python. Just define the prototypes of the functions and structures to make it work! On the other hand, the CFFI can be done in four ways: ABI and API, each of which can be done in-line or out-of-line. It’s worth knowing that the ABI works well on Windows, but pretty badly on other platforms and can be quite slow. So, see what’s best for you! Here is how you should go about your CFFI implementation:

```from cffi import FFI
ffi = FFI()

ffi.cdef("""
typedef struct t_point t_point;
struct t_point
{
double x;
double y;
};

typedef struct t_test t_test;
struct t_test
{
char    *sentence;
int     nb_points;
t_point *points;
double  *distances;
};

char *increment_string(char *str, int n);
void generate_points(t_test *test, int nb);
void distance_between_points(t_test *test);
"""
)

if __name__ == '__main__':
# Load C shared library
lib = ffi.dlopen("./libblog.so")

# Declare the C structure
test = ffi.new("struct t_test *")
test.sentence = ffi.new("char[]", "A nice sentence to test.".encode('utf-8'))
test.nb_points = 0
test.points = ffi.NULL
test.distances = ffi.NULL

# Call C functions
lib.generate_points(test, 10000)
test.sentence = lib.increment_string(test.sentence, 1)
lib.distance_between_points(test)```

## Benchmark

Time now to compare the different methods. I ran the test 10 times for each language. An operation on a string, the generation of 10,000 points and the calculation of the 100,000,000 distances between each of the points. To compare the languages, we used time and memory as criterias.

```          Time       Memory Used
Python   47.30 s    3 177 024 ko
Ctypes    0.45 s      795 084 ko
Cython    0.42 s      792 976 ko
CFFI      0.47 s      795 292 ko```

The difference between Python and the 3 languages that use C is very obvious! Calculations are at least 100 times faster and use 4 times less memory.

Cython has a slightly higher gain than the other 2, but the 3 languages are very similar in terms of performance. The choice of implementation is yours!

Now you know how to speed up your python with C, follow this great tutorial on how deploy your SAP Conversational AI Python chatbot in production on AWS.

Happy coding!