1

I am pretty new to scala I have a situation where I have a dataframe with multiple columns, some of these columns having random null values at random places. I need to find any such column having even a single null value and drop it from the dataframe.

#### Input

| Column 1      | Column 2      | Column 3      | Column 4      | Column 5      |
| --------------| --------------| --------------| --------------| --------------|
|(123)-456-7890 | 123-456-7890  |(123)-456-789  |               |(123)-456-7890 |
|(123)-456-7890 | 123-4567890   |(123)-456-7890 |(123)-456-7890 | null          |
|(123)-456-7890 | 1234567890    |(123)-456-7890 |(123)-456-7890 | null          |

#### Output

| Column 1      | Column 2      |
| --------------| --------------| 
|(123)-456-7890 | 123-456-7890  |
|(123)-456-7890 | 123-4567890   |
|(123)-456-7890 | 1234567890    |

Please advise. Thank you.

1

1 Answer 1

2

I would recommend a 2-step approach:

  1. Exclude columns that are not nullable from the dataframe
  2. Assemble a list of columns that contain at least a null and drop them altogether

Creating a sample dataframe with a mix of nullable/non-nullable columns:

import org.apache.spark.sql.Column
import org.apache.spark.sql.types._

val df0 = Seq(
  (Some(1), Some("x"), Some("a"), None),
  (Some(2), Some("y"), None,      Some(20.0)),
  (Some(3), Some("z"), None,      Some(30.0))
).toDF("c1", "c2", "c3", "c4")

val newSchema = StructType(df0.schema.map{ field =>
    if (field.name == "c1") field.copy(name = "c1_notnull", nullable = false) else field
  })

// Revised dataframe with non-nullable `c1`
val df = spark.createDataFrame(df0.rdd, newSchema)

Carrying out step 1 & 2:

val nullableCols = df.schema.collect{ case StructField(name, _, true, _) => name }
// nullableCols: Seq[String] = List(c2, c3, c4)

val colsWithNulls = nullableCols.filter(c => df.where(col(c).isNull).count > 0)
// colsWithNulls: Seq[String] = List(c3, c4)

df.drop(colsWithNulls: _*).show
// +----------+---+
// |c1_notnull| c2|
// +----------+---+
// |         1|  x|
// |         2|  y|
// |         3|  z|
// +----------+---+
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.