Start building your own chatbot now!

If you’re in the field of science or machine learning, you’ve probably used Python before, and you’ve probably noticed how incredibly slow it is!

To solve that issue, tools like Numpy are very useful to speed up calculations by having C in the background. However, by using C, you’re limited to number operations only.

At SAP Conversational AI, we use C to reduce calculation times and memory usage for the parts of our code treat a lot of operations.

In this article, I’m going to show you how to implement a Python code example with a call to C functions via Ctypes, Cython and CFFI. Then we’ll compare them in terms of time and memory used. Let’s roll!

Understanding the C functions you need

To benchmark different languages, let’s use some basic C functions to simulate calculations and exclude performance. In this test, I chose to set up nested structures, numbers and a string to see different types of data and how to manage them in each implementation.

typedef struct s_point
  double x;
  double y;
} t_point;

typedef struct s_test
  char *sentence;
  int nb_points;
  t_point *points;
  double *distances;
} t_test;

This function increase the value of each character by the value n:

char *increment_string(char *str, int n)
  for (int i = 0; str[i]; i++)
    str[i] = str[i] + n;
  return (str);

This function generate nb points with random coordinates:

void generate_points(t_test *test, int nb)
  t_point *points = calloc(nb + 1, sizeof(t_point));

  for (int i = 0; i < nb; i++)
    points[i].x = rand();
    points[i].y = rand();
  test->points = points;
  test->nb_points = nb;

This function calculate the distance between all points of the list for each points:

void distance_between_points(t_test *test)
  int nb = test->nb_points;
  double *distances = calloc(nb * nb + 1, sizeof(double));

  for (int i = 0; i < nb; i++)
    for (int j = 0; j < nb; j++)
      distances[i * nb + j] = sqrt((test->points[j].x - test->points[i].x) * (test->points[j].x - test->points[i].x) + (test->points[j].y - test->points[i].y) * (test->points[j].y - test->points[i].y));
  test->distances = distances;

Example code in Python without C

Let’s now implement the test in Python to have a base of reference of computation speed and memory used without the functions in C.

import random

class Point():
  def __init__(self, x, y):
    self.x = x
    self.y = y

class Test():
  def __init__(self, string, nb):
    self.string = string
    self.points = []
    for i in range(nb):
      self.points.append(Point(random.random(), random.random()))
    self.distances = []

  def increment_string(self, n):
    tmp = ""
    for c in self.string:
      tmp += chr(ord(c) + n)
    self.string = tmp

  def distance_between_points(self):
    for i, a in enumerate(self.points):
      for b in self.points:
        self.distances.append(((b.x - a.x) ** 2 + (b.y - b.x) ** 2) ** 0.5)

if __name__ == '__main__':
  test = Test("A nice sentence to test.", 10000)

Ctypes implementation 

Now that we have a reference, let’s understand how we can do the implementation in Ctypes. Ctypes is relatively simple to handle. The CDLL feature for importing libraries is very handy and being able to transform a Dict into a C structure helps a lot. On the other hand, the variables coming from Python require a conversion to go to C, which can take time.

import ctypes
from ctypes import *
from ctypes.util import find_library

# Ctypes structures
class Point(ctypes.Structure):
  _fields_ = [('x', ctypes.c_double), ('y', ctypes.c_double)]

class Test(ctypes.Structure):
  _fields_ = [
    ('sentence', ctypes.c_char_p),
    ('nb_points', ctypes.c_int),
    ('points', ctypes.POINTER(Point)),
    ('distances', ctypes.POINTER(c_double)),

# Lib C functions
_libc = ctypes.CDLL(find_library('c')) = [ctypes.c_void_p] = ctypes.c_void_p

# Lib shared functions
_libblog = ctypes.CDLL("./")
_libblog.increment_string.argtypes = [ctypes.c_char_p, ctypes.c_int]
_libblog.increment_string.restype = ctypes.c_char_p
_libblog.generate_points.argtypes = [ctypes.POINTER(Test), ctypes.c_int]
_libblog.distance_between_points.argtypes = [ctypes.POINTER(Test)]

if __name__ == '__main__':
  # Create the dict for generate the ctypes structure
  test = {}
  test['sentence'] = "A nice sentence to test.".encode('utf-8')
  test['nb_points'] = 0
  test['points'] = None
  test['distances'] = None
  c_test = Test(**test)
  ptr_test = ctypes.pointer(c_test)

  # Call C functions
  _libblog.generate_points(ptr_test, 10000)
  ptr_test.contents.sentence = _libblog.increment_string(ptr_test.contents.sentence, -5)

Cython implementation

Let’s talk Cython now. Cython lets you mix C and Python within a single file, so you can use C while still being in a Python file. This makes the use of C possible for people who don’t know it, because the C written in a Cython file uses Python syntax. Only the variables are in C. However, Cython has a few downsides, mainly that it requires more files. You need a file to cythonize the .pyx files and generate them in C, you need at least one .py file to call the Cython .pyx files in C after cythonization, and the declarations can be in .pxd files. Nevertheless, this is how you should go about it:


from distutils.core import setup
from Cython.Build import cythonize

  name = 'Test Cython',
  ext_modules = cythonize("test_cython.pyx"),


from test_cython import test

if __name__ == '__main__':

File test_cython.pyx:

import cython
import random

from libc.stdlib cimport calloc, free
from libc.math cimport sqrt

# Import C structures and functions from the C header
cdef extern from "libblog.h":
  ctypedef struct t_point:
    double  x
    double  y

  ctypedef struct t_test:
    char    *sentence
    int    nb_points
    t_point *points
    double  *distances

# C functions written in python syntax
cdef char *increment_string(char *str, int n):
  cdef int i = 0

  while str[i]:
    str[i] = str[i] + n
    i += 1
  return str

cdef void generate_points(t_test *test, int nb):
  cdef t_point *points = <t_point*>calloc(nb + 1, sizeof(t_point))

  for i in range(nb):
    points[i].x = random.random()
    points[i].y = random.random()
  test.points = points
  test.nb_points = nb

cdef void distance_between_points(t_test *test):
  cdef int nb = test.nb_points
  cdef double *distances = <double*>calloc(nb * nb + 1, sizeof(double))
  cdef int i
  cdef int j

  for i from 0 <= i < nb:
    for j from 0 <= j < nb:
      distances[i * nb + j] = sqrt((test.points[j].x - test.points[i].x) * (test.points[j].x - test.points[i].x) + (test.points[j].y - test.points[i].y) * (test.points[j].y - test.points[i].y))
  test.distances = distances

def test():
  # Declare the structure and set the values
  cdef t_test test

  py_sentence = "A nice sentence to test.".encode('utf-8')
  test.sentence = py_sentence
  test.nb_points = 0
  test.points = NULL
  test.distances = NULL

  # Call C functions written in python
  generate_points(&test, 10000)
  test.sentence = increment_string(test.sentence, 1)

  # Call C function free

CFFI implementation

Let’s now try to do the same with CFFI. The CFFI module is easy to handle, because it doesn’t require a pseudo-language to bridge C and Python. Just define the prototypes of the functions and structures to make it work! On the other hand, the CFFI can be done in four ways: ABI and API, each of which can be done in-line or out-of-line. It’s worth knowing that the ABI works well on Windows, but pretty badly on other platforms and can be quite slow. So, see what’s best for you! Here is how you should go about your CFFI implementation: 

from cffi import FFI
ffi = FFI()

typedef struct t_point t_point;
struct t_point
  double x;
  double y;

typedef struct t_test t_test;
struct t_test
  char    *sentence;
  int     nb_points;
  t_point *points;
  double  *distances;

char *increment_string(char *str, int n);
void generate_points(t_test *test, int nb);
void distance_between_points(t_test *test);

if __name__ == '__main__':
  # Load C shared library
  lib = ffi.dlopen("./")

  # Declare the C structure
  test ="struct t_test *")
  test.sentence ="char[]", "A nice sentence to test.".encode('utf-8'))
  test.nb_points = 0
  test.points = ffi.NULL
  test.distances = ffi.NULL

  # Call C functions
  lib.generate_points(test, 10000)
  test.sentence = lib.increment_string(test.sentence, 1)


Time now to compare the different methods. I ran the test 10 times for each language. An operation on a string, the generation of 10,000 points and the calculation of the 100,000,000 distances between each of the points. To compare the languages, we used time and memory as criterias.

          Time       Memory Used
Python   47.30 s    3 177 024 ko
Ctypes    0.45 s      795 084 ko
Cython    0.42 s      792 976 ko
CFFI      0.47 s      795 292 ko

The difference between Python and the 3 languages that use C is very obvious! Calculations are at least 100 times faster and use 4 times less memory.

Cython has a slightly higher gain than the other 2, but the 3 languages are very similar in terms of performance. The choice of implementation is yours!

Now you know how to speed up your python with C, follow this great tutorial on how deploy your SAP Conversational AI Python chatbot in production on AWS.

Happy coding!

Ask your questions on SAP Answers or get started with SAP Conversational AI!

Follow us on