I am learning Apache-Spark as well as its interface with AWS. I've already created a master node on AWS with 6 slave nodes. I also have the following Python code written with Spark:
from pyspark import SparkConf, SparkContext
conf = SparkConf().setAppName("print_num").setMaster("AWS_master_url")
sc = SparkContext(conf = conf)
# Make the list be distributed
rdd = sc.parallelize([1,2,3,4,5])
# Just want each of 5 slave nodes do the mapping work.
temp = rdd.map(lambda x: x + 1)
# Also want another slave node do the reducing work.
for x in temp.sample(False, 1).collect():
print x
My question is how I can set up the 6 slave nodes in AWS, such that 5 slave nodes do the mapping work as I mentioned in the code, and the other slave node do the reducing work. I really appreciate if anyone helps me.