How to register Scala UDF in spark-SQL, not Spark-Scala?

Question

It looks like a regular Hive statement should work. In my script.sql which I run through spark-sql --jars mylib.jar myscript.sql

CREATE TEMPORARY FUNCTION rank AS 'com.mycompany.udf.Custom.rankFunc';

...
CREATE TEMPORARY VIEW MyTable AS (
    SELECT
       rank(id)  AS rank,
       ...

In Scala code (mylib.jar):

package com.mycompany.udf

...

object Custom {
    def rankFunc(id: Long): Double = { Rank(id).rank }
    ....
}

However, Hive code does not see this function.

18/01/23 17:38:25 ERROR SparkSQLDriver: Failed in [
CREATE TEMPORARY FUNCTION rank AS 'com.mycompany.udf.Custom.rankFunc']
java.lang.ClassNotFoundException: com.mycompany.udf.Custom.rankFunc
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)

How should I change the code in my Scala library?

Sergey Khudyakov · Accepted Answer · 2018-01-26 10:54:44Z

3

+100

You're getting this error because Hive expects a function to be a class, not a method name.

Change your Scala code (UDF) to:

package com.mycompany.udf

class RankFunc extends org.apache.hadoop.hive.ql.exec.UDF {
  def evaluate(id: Long): Double = { Rank(id).rank }
}

... and SQL script to:

CREATE TEMPORARY FUNCTION rankFunc AS 'com.mycompany.udf.RankFunc'
...

Here are examples of how to create a custom UDF with Java and Scala.

answered Jan 26, 2018 at 10:54

Sergey Khudyakov

1,1821 gold badge8 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

BalaramRaju · Accepted Answer · 2018-01-25 19:50:55Z

0

Because there is lots of confusion I am updating my answer:

here is the code for md5 jave code:

package org.apache.hadoop.hive.ql.udf;
import org.apache.commons.codec.digest.DigestUtils;
import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.BytesWritable;
import org.apache.hadoop.io.Text;

public class UDFMd5 extends UDF {
  private final Text result = new Text();
  /**
   * Convert String to md5
   */
  public Text evaluate(Text n) {
    if (n == null) {
      return null;
    }
    String str = n.toString();
    String md5Hex = DigestUtils.md5Hex(str);
    result.set(md5Hex);
    return result;
  }  
}

I have taken the same jar used in Hive and was able to make it work :

AND This worked for me :

In hive i used :

create temporary function md5 AS 'org.apache.hadoop.hive.ql.udf.UDFMd5' USING JAR '/test/balaram/hive-MD5.jar;

In Spark I used :

create temporary function md5 AS 'org.apache.hadoop.hive.ql.udf.UDFMd5'

If This doesn't help, I am sorry

edited Jan 25, 2018 at 19:50

answered Jan 23, 2018 at 18:56

BalaramRaju

4392 silver badges8 bronze badges

5 Comments

Dmitry Petrov Over a year ago

yeah, I know how to do that with Spark. My question is about spark-sql.

BalaramRaju Over a year ago

You need to include the jar when starting your Spark SQL. You can use like below : CREATE TEMPORARY FUNCTION myFunnyUpper AS 'org.hue.udf.MyUpper'

Dmitry Petrov Over a year ago

Right. That's what I did in the code above. Also, I used a jar - updated in the question. But it does not work.

BalaramRaju Over a year ago

I have edited above post to show how I was able to make it work.

Dmitry Petrov Over a year ago

Could you please clarify? Where do you run sqlContext.udf.register("customudf", ..) ? I don't have main() in my jar.

Collectives™ on Stack Overflow

How to register Scala UDF in spark-SQL, not Spark-Scala?

2 Answers 2

Comments

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related