How to save data frame in ".txt" file using pyspark

Question

I have a dataframe with 1000+ columns. I need to save this dataframe as .txt file(not as .csv) with no header,mode should be "append"

used below command which is not working

df.coalesce(1).write.format("text").option("header", "false").mode("append").save("<path>")

error i got

pyspark.sql.utils.AnalysisException: 'Text data source supports only a single column,

Note: Should not use RDD to save. Becouse i need to save files multiple times in the same path.

In addition to what you tried, you could mention what error you get — sujit
– sujit, Commented Mar 23, 2018 at 11:07
What is your desired output? Do you want spaces instead of commas? — pault
– pault, Commented Mar 23, 2018 at 14:46

Alex · Accepted Answer · 2018-03-23 12:09:20Z

4

If you want to write out a text file for a multi column dataframe, you will have to concatenate the columns yourself. In the example below I am separating the different column values with a space and replacing null values with a *:

import pyspark.sql.functions as F

df = sqlContext.createDataFrame([("foo", "bar"), ("baz", None)], 
                            ('a', 'b'))

def myConcat(*cols):
    concat_columns = []
    for c in cols[:-1]:
        concat_columns.append(F.coalesce(c, F.lit("*")))
        concat_columns.append(F.lit(" "))  
    concat_columns.append(F.coalesce(cols[-1], F.lit("*")))
    return F.concat(*concat_columns)

df_text = df.withColumn("combined", myConcat(*df.columns)).select("combined")

df_text.show()

df_text.coalesce(1).write.format("text").option("header", "false").mode("append").save("output.txt")

This gives as output:

+--------+
|combined|
+--------+
| foo bar|
|   baz *|
+--------+

And your output file should look likes this

foo bar
baz *

answered Mar 23, 2018 at 12:09

Alex

21.9k11 gold badges68 silver badges77 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

bia Over a year ago

thank you for this! what about concatenating column names though?

Allan Abreu · Accepted Answer · 2018-08-10 20:15:43Z

3

You can concatenate the columns easily using the following line (assuming you want a positional file and not a delimited one, using this method for a delimited file would require that you had delimiter columns between each data column):

dataFrameWithOnlyOneColumn = dataFrame.select(concat(*dataFrame.columns).alias('data'))

After concatenating the columns, your previous line should work just fine:

dataFrameWithOnlyOneColumn.coalesce(1).write.format("text").option("header", "false").mode("append").save("<path>")

answered Aug 10, 2018 at 20:15

Allan Abreu

311 bronze badge

Comments

TojaQl · Accepted Answer · 2023-08-25 11:31:59Z

0

You could also transform pyspark dataframe to pandas and then save it to file. Something like this:

df_pyspark = spark.createDataFrame(data, schema=columns)

head_rows = df.toPandas()

string_representation = head_rows.to_string(index=False)

with open("file_name.txt", "w") as file:
    file.write(string_representation)

answered Aug 25, 2023 at 11:31

TojaQl

335 bronze badges

Collectives™ on Stack Overflow

How to save data frame in ".txt" file using pyspark

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related