# Python – Slow division in cython

cythonpython

In order to get fast division in cython, I can use the compiler directive

``````@cython.cdivision(True)
``````

This works, in that the resulting c code has no zero division checking. However for some reason it is actually making my code slower. Here is an example:

``````@cython.boundscheck(False)
@cython.wraparound(False)
@cython.nonecheck(False)
@cython.cdivision(True)
def example1(double[:] xi, double[:] a, double[:] b, int D):

cdef int k
cdef double[:] x = np.zeros(D)

for k in range(D):
x[k] = (xi[k] - a[k]) / (b[k] - a[k])

return x

@cython.boundscheck(False)
@cython.wraparound(False)
@cython.nonecheck(False)
def example2(double[:] xi, double[:] a, double[:] b, int D):

cdef int k
cdef double[:] x = np.zeros(D)

for k in range(D):
x[k] = (xi[k] - a[k]) / (b[k] - a[k])

return x

def test_division(self):

D = 10000
x = np.random.rand(D)
a = np.zeros(D)
b = np.random.rand(D) + 1

tic = time.time()
example1(x, a, b, D)
toc = time.time()

print 'With c division: ' + str(toc - tic)

tic = time.time()
example2(x, a, b, D)
toc = time.time()

print 'Without c division: ' + str(toc - tic)
``````

This results in output:

``````With c division: 0.000194787979126
Without c division: 0.000176906585693
``````

Is there any reason why turning off zero division checking could slow down things (I know there are no zero divisors).

#### Best Solution

Firstly, you need to call the functions many (>1000) times, and take an average of the time spent in each, to get an accurate idea of how different they are. Calling each function once will not be accurate enough.

Secondly, the time spent in the function will be affected by other things, not just the loop with divisions. Calling a `def` i.e. Python function like this involves some overhead in passing and returning the arguments. Also, creating a numpy array in the function will take time, so any differences in the loops in the two functions will be less obvious.

Finally, see here (https://github.com/cython/cython/wiki/enhancements-compilerdirectives), setting the c-division directive to False has a ~35% speed penalty. I think this is not enough to show up in your example, given the other overheads. I checked the C code output by Cython, and the code for example2 is clearly different and contains an additional zero division check, but when I profile it, the difference in run-time is negligible.

To illustrate this, I ran the code below, where I've taken your code and made the `def` functions into `cdef` functions, i.e. Cython functions rather than Python functions. This massively reduces the overhead of passing and returning arguments. I have also changed example1 and example2 to just calculate a sum over the values in the numpy arrays, rather than creating a new array and populating it. This means that almost all the time spent in each function is now in the loop, so it should be easier to see any differences. I have also run each function many times, and made D bigger.

``````@cython.boundscheck(False)
@cython.wraparound(False)
@cython.nonecheck(False)
@cython.cdivision(True)
@cython.profile(True)
cdef double example1(double[:] xi, double[:] a, double[:] b, int D):

cdef int k
cdef double theSum = 0.0

for k in range(D):
theSum += (xi[k] - a[k]) / (b[k] - a[k])

return theSum

@cython.boundscheck(False)
@cython.wraparound(False)
@cython.nonecheck(False)
@cython.profile(True)
@cython.cdivision(False)
cdef double example2(double[:] xi, double[:] a, double[:] b, int D):

cdef int k
cdef double theSum = 0.0

for k in range(D):
theSum += (xi[k] - a[k]) / (b[k] - a[k])

return theSum

def testExamples():
D = 100000
x = np.random.rand(D)
a = np.zeros(D)
b = np.random.rand(D) + 1

for i in xrange(10000):
example1(x, a, b, D)
example2(x, a, b,D)
``````

I ran this code through the profiler (python -m cProfile -s cumulative), and the relevant output is below:

``````ncalls  tottime  percall  cumtime  percall filename:lineno(function)
10000    1.546    0.000    1.546    0.000 test.pyx:26(example2)
10000    0.002    0.000    0.002    0.000 test.pyx:11(example1)
``````

which shows that example2 is much slower. If I turn c-division on in example2 then the time spent is identical for example1 and example2, so this clearly has a significant effect.