When reading data from a text file using pyspark using following code,
spark = SparkSession.builder.master("local[*]").getOrCreate()
df = sqlContext.read.option("sep", "|").option("header", "false").csv('D:\\DATA-2021-12-03.txt')
My data text file looks like,
col1|cpl2|col3|col4
112 |4344|fn1 | home_a| extras| applied | <null>| <empty>
But the output I got was,
col1|cpl2|col3|col4
112 |4344|fn1 | home_a
Is there a way to add those missing columns for the dataframe?
Expecting,
col1|cpl2|col3|col4|col5|col6|col7|col8
112 |4344|fn1 | home_a| extras| applied | <null>| <empty>
option("header", "true")and passing a custom schema should work, no? (there might some warnings about the header not matching the schema)