Looking for suggestions on how to unit test a Spark transformation with ScalaTest. The test class generates a DataFrame from static data and passes it to a transformation, then makes assertion on the passing static data generated in the test class. (The transform creates a second column b defined as col("a").plus(5).)
I got this to work but wonder if there's a better way to create the DataFrame? asJava feels awkward, as does defining the row data and schema separately.
class TransformTest extends FlatSpec with Matchers with SharedSparkContext {
"Transformer" should "add column to dataframe" in {
val sqlContext = new SQLContext(sc)
val rows = Seq[Row](
Row(1),
Row(2),
Row(3)
).asJava
val schema = StructType(Seq(StructField("a", IntegerType)))
val df = sqlContext.createDataFrame(rows, schema)
val df2 = new Transform().addCol(df)
assert(df2.count() > 0)
assert(df2.agg(sum("a")).first.getLong(0) == 6)
assert(df2.agg(sum("b")).first.getLong(0) == 21)
}
}