4

I would like to iterate through the content of a column in a spark DataFrame and correct the data within a cell if it meets a certain condition

+-------------+
|column_title |
+-------------+
+-----+
|null |
+-----+
+-----+
|0    |
+-----+
+-----+
|1    |
+-----+

Lets say I want to display something else when value of column is null, I tried with

Column.when() DataSet.withColumn()

But I cant find the right method, i don't think it would be necessary to convert to RDD and iterate through it.

0

2 Answers 2

4

You can use when and equalTo or when and isNull.

Dataset<Row> df1 = df.withColumn("value", when(col("value").equalTo("bbb"), "ccc").otherwise(col("value")));

Dataset<Row> df2 = df.withColumn("value", when(col("value").isNull(), "ccc").otherwise(col("value")));

If you only want to replace null values then you can also use na and fill.

Dataset<Row> df3 = df.na().fill("ccc");
Sign up to request clarification or add additional context in comments.

1 Comment

Can we do df.withColumn("*",..) if we intend to run the replacement in all the columns?
1

Another way of doing this could be by using UDF.

Create a UDF:

private static UDF1 myUdf = new UDF1<String, String>() {
public String call(final String str) throws Exception {
    // any condition or custom function can be used
    return StringUtils.rightPad(str, 25, 'A');
  }
};

Register UDF in SparkSession:

sparkSession.udf().register("myUdf", myUdf, DataTypes.StringType);

Apply udf on dataset:

Dataset<Row> dataset = dataset.withColumn("city", functions.callUDF("myudf", col("city")));

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.