Spark Scala getting class not found scala.Any

Question

val schema = df.schema
val x = df.flatMap(r =>
  (0 until schema.length).map { idx =>
    ((idx, r.get(idx)), 1l)
  }
)

This produces the error

java.lang.ClassNotFoundException: scala.Any

I am not sure why, any help?

Can you please try to rebuild your project? This seems to be more of an indexing issue in your editor — Chaitanya
– Chaitanya, Commented Dec 13, 2018 at 17:14
This is executed on databricks spark engine, there is no "rebuild" @ChaitanyaWaikar — jayjay93
– jayjay93, Commented Dec 13, 2018 at 17:16
The Row.get method returns a value of type Any since it doesn't know the type, but Any is not serializable and not a valid Spark structured type. You could use r.getString(idx) If you are expecting each record to be a String — Tom Lous
– Tom Lous, Commented Dec 13, 2018 at 17:23
I need each type to come as expected in the schema, is there no way to do that @TomLous? — jayjay93
– jayjay93, Commented Dec 13, 2018 at 17:39

stack0114106 · Accepted Answer · 2018-12-14 06:28:47Z

One way is to cast all columns to String. Note that I'm changing the r.get(idx) to r.getString(idx) in your code. The below works.

scala> val df = Seq(("ServiceCent4","AP-1-IOO-PPP","241.206.155.172","06-12-18:17:42:34",162,53,1544098354885L)).toDF("COL1","COL2","COL3","EventTime","COL4","COL5","COL6")
df: org.apache.spark.sql.DataFrame = [COL1: string, COL2: string ... 5 more fields]

scala> df.show(1,false)
+------------+------------+---------------+-----------------+----+----+-------------+
|COL1        |COL2        |COL3           |EventTime        |COL4|COL5|COL6         |
+------------+------------+---------------+-----------------+----+----+-------------+
|ServiceCent4|AP-1-IOO-PPP|241.206.155.172|06-12-18:17:42:34|162 |53  |1544098354885|
+------------+------------+---------------+-----------------+----+----+-------------+
only showing top 1 row

scala> df.printSchema
root
 |-- COL1: string (nullable = true)
 |-- COL2: string (nullable = true)
 |-- COL3: string (nullable = true)
 |-- EventTime: string (nullable = true)
 |-- COL4: integer (nullable = false)
 |-- COL5: integer (nullable = false)
 |-- COL6: long (nullable = false)


scala> val schema = df.schema
schema: org.apache.spark.sql.types.StructType = StructType(StructField(COL1,StringType,true), StructField(COL2,StringType,true), StructField(COL3,StringType,true), StructField(EventTime,StringType,true), StructField(COL4,IntegerType,false), StructField(COL5,IntegerType,false), StructField(COL6,LongType,false))

scala> val df2 = df.columns.foldLeft(df){ (acc,r) => acc.withColumn(r,col(r).cast("string")) }
df2: org.apache.spark.sql.DataFrame = [COL1: string, COL2: string ... 5 more fields]

scala> df2.printSchema
root
 |-- COL1: string (nullable = true)
 |-- COL2: string (nullable = true)
 |-- COL3: string (nullable = true)
 |-- EventTime: string (nullable = true)
 |-- COL4: string (nullable = false)
 |-- COL5: string (nullable = false)
 |-- COL6: string (nullable = false)


scala> val x = df2.flatMap(r => (0 until schema.length).map { idx => ((idx, r.getString(idx)), 1l) } )
x: org.apache.spark.sql.Dataset[((Int, String), Long)] = [_1: struct<_1: int, _2: string>, _2: bigint]

scala> x.show(5,false)
+---------------------+---+
|_1                   |_2 |
+---------------------+---+
|[0,ServiceCent4]     |1  |
|[1,AP-1-IOO-PPP]     |1  |
|[2,241.206.155.172]  |1  |
|[3,06-12-18:17:42:34]|1  |
|[4,162]              |1  |
+---------------------+---+
only showing top 5 rows


scala>

Collectives™ on Stack Overflow

Spark Scala getting class not found scala.Any

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related