I am trying to compare the performance of Mathematica vs Python for vectorized operations involving polynomials. The data is floatMatrix which has dimensions (750000, 4). The function testFunction[x,y,w,z] is a polynomial function of 4 variables that returns a 4D vector and is meant to be applied to all of the 750000 vectors. Both codes below contain the same polynomial functions written in an explicit form (I have not included the full polynomials because they are long, and they are exactly the same).
For Mathematica, I am using a listable Compile with parallelization.
Compile[{{f, _Real, 1}},
{
{0.011904761904761973` f[[2]]f[[1]]^3 +
0.002976190476190474` f[[1]]f[[2]]^3 - 0.020833333333333325` f[[3]] +
0.002976190476190474` f[[3]]^3 +
f[[2]]^2 (0.0029761904761904778` f[[3]] +...
{0.002976190476190483` f[[1]]^3 + 0.011904761904761906` f[[2]]^3 -
0.0875` f[[3]] + 0.0029761904761904765` f[[3]]^3 +
f[[1]]^2 (0.005952380952380952` f[[2]] +...
},CompilationTarget -> "C", RuntimeAttributes -> {Listable},
Parallelization -> True];
time = RepeatedTiming[testFunction[floatMatrix]];
Print["In Mathematica-C it takes an average of ", time[[1]], " secs."]
For Python I am using NumPy
def testFunction(data):
f1, f2, f3, f4 = data.T
results = np.zeros((data.shape[0], 4)) # Initialize a results array
results[:, 0] = (0.011904761904761973*f2*f1**3 + 0.002976190476190474*f1*f2**3 -
0.020833333333333325*f3 + 0.002976190476190474*f3**3 + f2**2*
(0.0029761904761904778*f3 +...
results[:, 1] = (0.002976190476190483*f1**3 + 0.011904761904761906*f2**3 - 0.0875*f3
+ 0.0029761904761904765*f3**3 + f1**2*(0.005952380952380952*f2 +
0.002976190476190469*f3 + 0.0029761904761904726*f4) +...
return results
duration=0
for i in range(10):
start_time = time.time()
testFunction(floatMatrix)
end_time = time.time()
duration = duration + end_time - start_time
duration=duration*0.1
print(f"With numpy it takes an average of': {duration} seconds")
As you can see it is a simple straightforward comparison. In my machine, Mathematica gets 0.1 secs and Python 0.4 secs (also in Google Colab). People often talk about NumPy as being incredibly fast this makes me think that either I am doing something wrong or those people don't know how to exploit Mathematica's parallelization and packed arrays.
Which one is it? Am I using the tools incorrectly?
EDIT: After I few suggestions by azerbajdzan I tried running the codes on a multi processor CPU. This time a 10 core. These are the results:
- Mathematica with
Parallelization -> True: 0.0319974 seconds - Mathematica with
Parallelization -> False: 0.119938 seconds - Numpy: 0.206754 seconds
So, the question still stands.
UPDATE: I have used @azerbajdzan suggestions and removed the transposing from inside of the function. I have created a continuation question here. The python code without transposing is still slower than Mathematica's. They are both run in a single processor and produce the same output.
Parallelization -> Trueyou tell the compiler you want the code run on all processors. If you have 4 core processor then Mathematica code runs on all 4 of them. Your python code runs only on one core. But there are methods to parallelize also on Python. Just search for "python parallelize". Remove the parameterParallelization -> Trueand then compare the times. Or parallelize also python code. $\endgroup$