0

When reading data from a text file using pyspark using following code,

spark = SparkSession.builder.master("local[*]").getOrCreate()
df = sqlContext.read.option("sep", "|").option("header", "false").csv('D:\\DATA-2021-12-03.txt')

My data text file looks like,

col1|cpl2|col3|col4
112 |4344|fn1 | home_a| extras| applied | <null>| <empty>

But the output I got was,

col1|cpl2|col3|col4
112 |4344|fn1 | home_a

Is there a way to add those missing columns for the dataframe?

Expecting,

col1|cpl2|col3|col4|col5|col6|col7|col8
112 |4344|fn1 | home_a| extras| applied | <null>| <empty>
5
  • I think that you have to either modify the CSV file header or insert it before you read the file using a csv reader. Commented Dec 24, 2021 at 18:25
  • @codebot can you update the expected dataframe Commented Dec 24, 2021 at 18:46
  • @Nitish added the expected dataframe Commented Dec 24, 2021 at 18:48
  • if you don't care about the names of the existing columns,option("header", "true") and passing a custom schema should work, no? (there might some warnings about the header not matching the schema) Commented Dec 24, 2021 at 18:50
  • yeah. tried that. but it gives me an error. dataframe conversion is failing Commented Dec 24, 2021 at 18:51

1 Answer 1

2

You can explicitly specify the schema, instead of infering it.


from pyspark.sql.types import StructType,StructField, StringType, IntegerType 
schema = StructType() \
      .add("col1",StringType(),True) \
      .add("col2",StringType(),True) \
      .add("col3",StringType(),True) \
      .add("col4",StringType(),True) \
      .add("col5",StringType(),True) \
      .add("col6",StringType(),True) \
      .add("col7",StringType(),True) \
      .add("col8",StringType(),True) 

df = spark.read.option("sep", "|").option("header", "true").schema(schema).csv('70475571_data.txt')

Output

+----+----+----+-------+-------+---------+-------+--------+
|col1|col2|col3|   col4|   col5|     col6|   col7|    col8|
+----+----+----+-------+-------+---------+-------+--------+
|112 |4344|fn1 | home_a| extras| applied | <null>| <empty>|
+----+----+----+-------+-------+---------+-------+--------+
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.