0

I would like to use our spark cluster to run programs in parallel. My idea is to do sth like the following:

def simulate():
  #some magic happening in here
return 0

spark = (
SparkSession.builder
    .appName('my_simulation')
    .enableHiveSupport()
    .getOrCreate())

sc = spark.sparkContext

no_parallel_instances = sc.parallelize(xrange(500))
res = no_parallel_instances.map(lambda row: simulate())
print res.collect()

The question i have is whether there's a way to execute simulate() with different parameters. The only way i currently can imagine is to have a dataframe specifying the parameters, so something like this:

parameter_list = [[5,2.3,3], [3,0.2,4]]
no_parallel_instances = sc.parallelize(parameter_list)
res = no_parallel_instances.map(lambda row: simulate(row))
print res.collect()

Is there another, more elegant way to run parallel functions with spark?

1 Answer 1

1

If the data you are looking to parameterize your call with differs between each row, then yes you will need to include that with each row.

However, if you are looking to set global parameters that affect every row, then you can use a broadcast variable.

http://spark.apache.org/docs/latest/rdd-programming-guide.html#broadcast-variables

Broadcast variables are created once in your script and cannot be modified after that. Spark will efficiently distribute those values to every executor to make them available to your transformations. To create one you provide the data to spark and it gives you back a handle you can use to access it on the executors. For example:

settings_bc = sc.broadcast({
   'num_of_donkeys': 3,
   'donkey_color': 'brown'
})

def simulate(settings, n):
    # do magic
    return n

no_parallel_instances = sc.parallelize(xrange(500))
res = no_parallel_instances.map(lambda row: simulate(settings_bc.value, row))
print res.collect()
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.