0

Below code fails with AnalysisException: sc.version String = 1.6.0

case class Person(name: String, age: Long)
val caseClassDF = Seq(Person("Andy", 32)).toDF()
caseClassDF.count()

val seq = Seq(1)
val rdd = sqlContext.sparkContext.parallelize(seq)
val df2 = rdd.toDF("Counts")
df2.count()

val withCounts = caseClassDF.withColumn("duration", df2("Counts"))

2 Answers 2

1

For some reason, it works with UDF:

import org.apache.spark.sql.functions.udf
case class Person(name: String, age: Long, day: Int)
val caseClassDF = Seq(Person("Andy", 32, 1), Person("Raman", 22, 1), Person("Rajan", 40, 1), Person("Andy", 42, 2), Person("Raman", 42, 2), Person("Rajan", 50, 2)).toDF()

val calculateCounts= udf((x: Long, y: Int) => 
  x+y)

val df1 = caseClassDF.withColumn("Counts", calculateCounts($"age", $"day"))
df1.show

+-----+---+---+------+
| name|age|day|Counts|
+-----+---+---+------+
| Andy| 32|  1|    33|
|Raman| 22|  1|    23|
|Rajan| 40|  1|    41|
| Andy| 42|  2|    44|
|Raman| 42|  2|    44|
|Rajan| 50|  2|    52|
+-----+---+---+------+
Sign up to request clarification or add additional context in comments.

Comments

0

caseClassDF.withColumn("duration", df2("Counts")), Here the column should be of the same dataframe (in your case caseClassDF). AFAIK, Spark does not allow column of a different DataFrame in withColumn.

PS: I am a user of Spark 1.6.x, not sure whether this has come up in Spark 2.x

1 Comment

Thanks Rishabh. Updated the Spark version. I am not convinced that Column should be from the same dataframe.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.