I am learning Spark, and I just got a problem when I used Spark to deal with a list of Python object. The following is my code:
import numpy as np
from pyspark import SparkConf, SparkContext
### Definition of Class A
class A:
def __init__(self, n):
self.num = n
### Function "display"
def display(s):
print s.num
return s
def main():
### Initialize the Spark
conf = SparkConf().setAppName("ruofan").setMaster("local")
sc = SparkContext(conf = conf)
### Create a list of instances of Class A
data = []
for i in np.arange(5):
x = A(i)
data.append(x)
### Use Spark to parallelize the list of instances
lines = sc.parallelize(data)
### Spark mapping
lineLengths1 = lines.map(display)
if __name__ == "__main__":
main()
When I run my code, it seemed not printing the number of each instance (But it should have printed 0, 1, 2, 3, 4). I try to find the reasons, but I have no ideas on this. I would really appreciate if anyone help me.