I'm trying to do a mapreduce like operation using python spark. Here is what i have and my problem.
object_list = list(objects) #this is precomputed earlier in my script
def my_map(obj):
return [f(obj)]
def my_reduce(obj_list1, obj_list2):
return obj_list1 + obj_list2
What I am trying to do in is something like the following:
myrdd = rdd(object_list) #objects are now spread out
myrdd.map(my_map)
myrdd.reduce(my_reduce)
my_result = myrdd.result()
where my_result should now just be = [f(obj1), f(obj2), ..., f(objn)]. I want to use spark purely for the speed, my script has been taking to long when doing this in a forloop. Does anyone know how to do the above in spark?