0

enter image description hereI have a dataframe with two columns (text, useful). I would like to replace any value in "useful" column greater that 20 to 1, else set to zero. Need some help with this. I'm using scala in databricks community

+--------------------+------+
|Located in a base...|     0|
| I am not a vegeta  |    12|



+--------------------+------+
|                text|useful|
+--------------------+------+
|Located in a base...|     0|
|I am not a vegeta...|     1|
|There is so many ...|    12|
|Disclaimer: this ...|     0|
|House Special Chi...|     0|
|The food at Chez ...|     2|
|Overall not bad. ...|     3|

enter image description here

df

5
  • Just can't figure this out. import org.apache.spark.sql.functions._ val uDf = Df.withColumn("useful", regexp_replace(col("useful") > 20) => 1) Commented Jul 27, 2018 at 11:51
  • I had tried that but keep getting error "error: too many arguments for method withColumn: (colName: String, col: org.apache.spark.sql.Column)org.apache.spark.sql.DataFrame" val uDf = DF.withColumn("label", when(col("label") > 20), 1).otherwise(0) ^ Commented Jul 27, 2018 at 13:34
  • I misplaced one bracket so the correct one is val uDf = Df.withColumn("useful", when((col("useful") > 20), 1).otherwise(0)) Commented Jul 27, 2018 at 13:40
  • That's what I was missing. Thanks a lot Commented Jul 27, 2018 at 13:45
  • Possible duplicate of Scala: How can I replace value in Dataframes using scala Commented Jul 27, 2018 at 13:49

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.