6

I am trying the word count problem in spark using python. But I am facing the problem when I try to save the output RDD in a text file using .saveAsTextFile command. Here is my code. Please help me. I am stuck. Appreciate for your time.

import re

from pyspark import SparkConf , SparkContext

def normalizewords(text):
    return re.compile(r'\W+',re.UNICODE).split(text.lower())

conf=SparkConf().setMaster("local[2]").setAppName("sorted result")
sc=SparkContext(conf=conf)

input=sc.textFile("file:///home/cloudera/PythonTask/sample.txt")

words=input.flatMap(normalizewords)

wordsCount=words.map(lambda x: (x,1)).reduceByKey(lambda x,y: x+y)

sortedwordsCount=wordsCount.map(lambda (x,y):(y,x)).sortByKey()

results=sortedwordsCount.collect()

for result in results:
    count=str(result[0])
    word=result[1].encode('ascii','ignore')

    if(word):
        print word +"\t\t"+ count

results.saveAsTextFile("/var/www/myoutput")
6
  • what is the problem, can you show the error please? Commented Dec 4, 2015 at 11:22
  • Please format properly your question highlighting the code Commented Dec 4, 2015 at 11:25
  • Traceback (most recent call last): File "/home/cloudera/PythonTask/sorteddata.py", line 24, in <module> results.saveAsTextFile("var/www/myoutput") AttributeError: 'list' object has no attribute 'saveAsTextFile' Commented Dec 4, 2015 at 11:29
  • Try saving sortedwordsCount instead Commented Dec 4, 2015 at 11:33
  • Thank you all for all your help. Commented Dec 4, 2015 at 12:27

2 Answers 2

8

since you collected results=sortedwordsCount.collect() so, its not RDD. It will be normal python list or tuple.

As you know list is python object/data structure and append is method to add element.

>>> x = []
>>> x.append(5)
>>> x
[5]

Similarly RDD is sparks object/data structure and saveAsTextFile is method to write the file. Important thing is its distributed data structure.

So, we cannot use append on RDD or saveAsTextFile on list. collect is method on RDD to get to RDD to driver memory.

As mentioned in comments, save sortedwordsCount with saveAsTextFile or open file in python and use results to write in a file

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for your suggestion. So please tell me how should I proceed now to store the result in a text file. Actually I am new to spark with python programming, so don't know much about this.
1

Change results=sortedwordsCount.collect() to results=sortedwordsCount, because using .collect() results will be a list.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.