2

How can I replace empty values in a column Field1 of DataFrame df?

Field1 Field2
       AA
12     BB

This command does not provide an expected result:

df.na.fill("Field1",Seq("Anonymous"))

The expected result:

Field1          Field2
Anonymous       AA
12              BB
1
  • 2
    Please add some details: what result are you expecting, and what are you getting instead? Commented May 9, 2018 at 19:35

3 Answers 3

3

You can also try this. This might handle both blank/empty/null

df.show()
+------+------+
|Field1|Field2|
+------+------+
|      |    AA|
|    12|    BB|
|    12|  null|
+------+------+

df.na.replace(Seq("Field1","Field2"),Map(""-> null)).na.fill("Anonymous", Seq("Field2","Field1")).show(false)   

+---------+---------+
|Field1   |Field2   |
+---------+---------+
|Anonymous|AA       |
|12       |BB       |
|12       |Anonymous|
+---------+---------+   
Sign up to request clarification or add additional context in comments.

Comments

2

Fill: Returns a new DataFrame that replaces null or NaN values in numeric columns with value.

Two things:

  1. An empty string is not null or NaN, so you'll have to use a case statement for that.
  2. Fill seems to not work well when giving a text value into a numeric column.

Failing Null Replace with Fill / Text:

scala> a.show
+----+---+
|  f1| f2|
+----+---+
|null| AA|
|  12| BB|
+----+---+

scala> a.na.fill("Anonymous", Seq("f1")).show
+----+---+
|  f1| f2|
+----+---+
|null| AA|
|  12| BB|
+----+---+

Working Example - Using Null With All Numbers:

scala> a.show
+----+---+
|  f1| f2|
+----+---+
|null| AA|
|  12| BB|
+----+---+


scala> a.na.fill(1, Seq("f1")).show
+---+---+
| f1| f2|
+---+---+
|  1| AA|
| 12| BB|
+---+---+

Failing Example (Empty String instead of Null):

scala> b.show
+---+---+
| f1| f2|
+---+---+
|   | AA|
| 12| BB|
+---+---+


scala> b.na.fill(1, Seq("f1")).show
+---+---+
| f1| f2|
+---+---+
|   | AA|
| 12| BB|
+---+---+

Case Statement Fix Example:

scala> b.show
+---+---+
| f1| f2|
+---+---+
|   | AA|
| 12| BB|
+---+---+


scala> b.select(when(col("f1") === "", "Anonymous").otherwise(col("f1")).as("f1"), col("f2")).show
+---------+---+
|       f1| f2|
+---------+---+
|Anonymous| AA|
|       12| BB|
+---------+---+

Comments

1

You can try using below code when you have n number of columns in dataframe.

Note: When you are trying to write data into formats like parquet, null data types are not supported. we have to type cast it.

val df = Seq(
(1, ""),
(2, "Ram"),
(3, "Sam"),
(4,"")
).toDF("ID", "Name")

// null type column

val inputDf = df.withColumn("NulType", lit(null).cast(StringType))

//Output

+---+----+-------+
| ID|Name|NulType|
+---+----+-------+
|  1|    |   null|
|  2| Ram|   null|
|  3| Sam|   null|
|  4|    |   null|
+---+----+-------+

//Replace all blank space in the dataframe with null

val colName = inputDf.columns //*This will give you array of string*

val data = inputDf.na.replace(colName,Map(""->"null"))

data.show()
+---+----+-------+
| ID|Name|NulType|
+---+----+-------+
|  1|null|   null|
|  2| Ram|   null|
|  3| Sam|   null|
|  4|null|   null|
+---+----+-------+

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.