0

I am planning to use Scala Object in Pyspark. This is the below code in Scala

package za.co.absa.cobrix.spark.cobol.utils

import org.apache.spark.sql.{Column, DataFrame}
import scala.annotation.tailrec
import scala.collection.mutable

object SparkUtils {

  def flattenSchema(df: DataFrame, useShortFieldNames: Boolean = false): DataFrame = {
   val fields = new mutable.ListBuffer[Column]()
   val stringFields = new mutable.ListBuffer[String]()
   val usedNames = new mutable.HashSet[String]()
 }
}

Github link : https://github.com/AbsaOSS/cobrix/blob/f95efdcd5f802b903404162313f5663bf5731a83/spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/utils/SparkUtils.scala

I just copied few lines of flattenSchema() method

Spark code in Scala:

import za.co.absa.cobrix.spark.cobol.utils.SparkUtils
val dfFlattened = SparkUtils.flattenSchema(df)

I tried to call same flattenSchema() method in PySpark after importing the jar in spark-submit

    dfflatten = DataFrame(sparkContext._jvm.za.co.absa.cobrix.spark.cobol.utils.SparkUtils.flattenSchema(df._jdf),sqlContext)

But getting error message:

df = sparkCont._jvm.za.co.absa.cobrix.spark.cobol.utils.SparkUtils.flattenSchema(df._jdf)
File "/opt/cloudera/parcels/SPARK2-2.4.0.cloudera2-1.cdh5.13.3.p3544.1321029/lib/spark2/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py",
line 1257, in __call__   File
"/opt/cloudera/parcels/SPARK2-2.4.0.cloudera2-1.cdh5.13.3.p3544.1321029/lib/spark2/python/lib/pyspark.zip/pyspark/sql/utils.py",
line 63, in deco   File
"/opt/cloudera/parcels/SPARK2-2.4.0.cloudera2-1.cdh5.13.3.p3544.1321029/lib/spark2/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 332, in get_return_value py4j.protocol.Py4JError: An error
 occurred while calling
z:za.co.absa.cobrix.spark.cobol.utils.SparkUtils.flattenSchema. Trace:
py4j.Py4JException: Method flattenSchema([]) does not exist
    at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
    at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:339)
    at py4j.Gateway.invoke(Gateway.java:276)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:748)

Please help!

1

1 Answer 1

2
+100

You forgot to declare the object at Scala so the Python part can find it. Something like this:

package za.co.absa.cobrix.spark.cobol.utils

import org.apache.spark.sql.{Column, DataFrame}
import scala.annotation.tailrec
import scala.collection.mutable

object SparkUtils {
  def flattenSchema(df: DataFrame, useShortFieldNames: Boolean): DataFrame = {
   val fields = new mutable.ListBuffer[Column]()
   val stringFields = new mutable.ListBuffer[String]()
   val usedNames = new mutable.HashSet[String]()
  }
}

IMPORTANT: Also try not to use method overloading (or default parameters that actually leads to method overloading or other tricks underneath) ... this will be hard to translate (and use it at the Python side).

NOTE: to overcome the lack of default values, just pass the value explicitly from Python part and it's done, in this case just an additional boolean. Additionally you may create the default at Python side, it's safer and useful (specially if you have a lot of calling points).

Sign up to request clarification or add additional context in comments.

3 Comments

Hi, Actually the Scala code contains Object declaration but I forgot to copy in the question. I have updated the question and linked the Scala source code github link. Thanks!
@Arvinth did you try to pass to flattenSchema 2 args (df and boolean), not only dataframe?
@Carlos Before i tried with only with 1 argument(df). Now when i ran with 2 args ( df and bool), its working fine. Thanks a lot!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.