How to drop multiple columns from Spark Data Frame?

Question

I have one CSV in which some column headers and their corresponding values are null. I would like to know how can I drop columns which have name null? Sample CSV is as follows:

"name"|"age"|"city"|"null"|"null"|"null"
"abcd"|"21" |"7yhj"|"null"|"null"|"null"
"qazx"|"31" |"iuhy"|"null"|"null"|"null"
"foob"|"51" |"barx"|"null"|"null"|"null"

I want to drop all the columns which has header has null such that output data frame will look like below:

"name"|"age"|"city"
"abcd"|"21" |"7yhj"
"qazx"|"31" |"iuhy"
"foob"|"51" |"barx"

When I load this CSV in spark, Spark appends number to null columns like shown below:

"name"|"age"|"city"|"null4"|"null5"|"null6"
"abcd"|"21" |"7yhj"|"null"|"null"|"null"
"qazx"|"31" |"iuhy"|"null"|"null"|"null"
"foob"|"51" |"barx"|"null"|"null"|"null"

Solution found

Thanks @MaxU for the answer. My final solution is:

val filePath = "C:\\Users\\shekhar\\spark-trials\\null_column_header_test.csv"

val df = spark.read.format("csv")
.option("inferSchema", "false")
.option("header", "true")
.option("delimiter", "|")
.load(filePath)

val q = df.columns.filterNot(c => c.startsWith("null")).map(a => df(a))
// df.columns.filterNot(c => c.startsWith("null")) this part removes column names which start with null and returns array of string. each element of array represents column name

// .map(a => df(a)) converts elements of array into object of type Column
df.select(q:_*).show

MaxU - stand with Ukraine · Accepted Answer · 2017-11-20 13:12:58Z

4

IIUC you can do it this way:

df = df.drop(df.columns.filter(_.startsWith("null")))

edited Nov 20, 2017 at 13:12

answered Oct 16, 2017 at 18:41

MaxU - stand with Ukraine

212k37 gold badges402 silver badges436 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Dariusz Krynicki Over a year ago

you have forgotten to close it so you are missing one bracket.

Collectives™ on Stack Overflow

How to drop multiple columns from Spark Data Frame?

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related