I was testing the efficiency of my simple shared C library and comparing it with the numpy implmentation.
Library creation: The following function is defined in sum_function.c:
float sum_vector(float* data, int num_row){
float value = 0.0;
for (int i = 0; i < num_row; i++){
value += data[i];
}
return value;
}
Library compilation: the shared library sum.so is created by
clang -c sum_function.c
clang -shared -o sum.so sum_function.o
Measurement: a simple numpy array is created and the sum of its elements is calculated using the above function.
from ctypes import *
import numpy as np
N = int(1e7)
data = np.arange(N, dtype=np.float32)
libc = cdll.LoadLibrary("sum.so")
libc.sum_vector.restype = c_float
libc.sum_vector(data.ctypes.data_as(POINTER(c_float)),
c_int(N))
The above function takes 30 ms. However, if I use numpy.sum, the execution time is only 4 ms.
So my question is: what makes numpy a lot faster than my C implementation? I cannot think about any improvement in terms of algorithm for calculating the sum of a vector.
clang -c sum_function.c-- Do those command-line parameters enable optimizations? If not, then your timings are meaningless.#include <numeric> float sum_vector(float* data, int num_row) { return std::accumulate(data, data + num_row, 0.0f); }-- That is a one liner that you should measure with the proper optimization settings when compiling your code.numpylibrary is optimized Fortran code. The clang generated code is likely not optimized, and may even be debug code depending on your settings.