2

It might be strange, but I was wondering how to replace any number of a whole DataFrame's Column for null using Scala.

Imagine I have a nullable DoubleType column named col. There, I want to replace all numbers different to (1.0 ~ 10.0) by a null.

I tried unsatisfactorily the next code.

val xf = df.na.replace("col", Map(0.0 -> null.asInstanceOf[Double]).toMap)

But, as you realize in Scala when you convert a null into a Double it becomes represented as a 0.0, and this is not what I want. Besides, I can't realize any way to do it with a range of values. Therefore, I am thinking if there is any way to achieve this?

2
  • Is na.replace a hard requirement here? Commented Feb 23, 2016 at 20:25
  • @zero323 No there is no need, I am just pulling out my hairs. Commented Feb 23, 2016 at 20:28

1 Answer 1

2

How about when clause instead?

import org.apache.spark.sql.functions.when

val df = sc.parallelize(
  (1L, 0.0) :: (2L, 3.6) :: (3L, 12.0) :: (4L, 5.0) ::  Nil
).toDF("id", "val")

df.withColumn("val", when($"val".between(1.0, 10.0), $"val")).show

// +---+----+
// | id| val|
// +---+----+
// |  1|null|
// |  2| 3.6|
// |  3|null|
// |  4| 5.0|
// +---+----+

Any value which doesn't satisfy the predicate (here val BETWEEN 1.0 AND 10.0) will be replaced with NULL.

See also Create new Dataframe with empty/null field values

Sign up to request clarification or add additional context in comments.

4 Comments

Just to leave it as an informative comment for the future. How would you do if you want to replace others except....? :)
You can replace $"val".between(1.0, 10.0) with some other logical expression (isIn, not(isIn) and so on).
@AlbertoBonsanto Can I take a moment of you time?
If so lets switch to chat: chat.stackoverflow.com/rooms/103319/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.