How to save a spark dataframe as a text file without Rows in pyspark?

Question

I have a dataframe "df" with the columns ['name', 'age'] I saved the dataframe using df.rdd.saveAsTextFile("..") to save it as an rdd. I loaded the saved file and then collect() gives me the following result.

a = sc.textFile("\mee\sample")
a.collect()
Output:
    [u"Row(name=u'Alice', age=1)",
     u"Row(name=u'Alice', age=2)",
     u"Row(name=u'Joe', age=3)"]

This is not an rdd of Rows.

a.map(lambda g:g.age).collect()
AttributeError: 'unicode' object has no attribute 'age'

Is there any way to save the dataframe as a normal rdd without column names and Row keywords? I want to save the dataframe so that on loading the file and collect should give me as follows:

a.collect()   
[(Alice,1),(Alice,2),(Joe,3)]

zero323 · Accepted Answer · 2016-03-05 07:01:25Z

1

It is a normal RDD[Row]. Problem is you that when you saveAsTextFile and load with textFile what you get is a bunch of strings. If you want to save objects you should use some form of serialization. For example pickleFile:

from pyspark.sql import Row

df = sqlContext.createDataFrame(
   [('Alice', 1), ('Alice', 2), ('Joe', 3)],
   ("name", "age")
)

df.rdd.map(tuple).saveAsPickleFile("foo")
sc.pickleFile("foo").collect()

## [('Joe', 3), ('Alice', 1), ('Alice', 2)]

edited Mar 5, 2016 at 7:01

answered Dec 4, 2015 at 9:07

zero323

331k108 gold badges982 silver badges958 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

bluerubez Over a year ago

Yeah but how would you load that pickle file back to a spark df ?

zero323 Over a year ago

@bluerubez OP doesn't want DataFrame back. There better formats if you want to serialize DataFrame, although tuples can work as well.

leo lu · Accepted Answer · 2021-06-09 09:59:26Z

0

I think you can do like this:

a.map(lambda x:(x[0],x[1])).collect()

answered Jun 9, 2021 at 9:59

leo lu

1

Collectives™ on Stack Overflow

How to save a spark dataframe as a text file without Rows in pyspark?

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related