1

It looks like a regular Hive statement should work. In my script.sql which I run through spark-sql --jars mylib.jar myscript.sql

CREATE TEMPORARY FUNCTION rank AS 'com.mycompany.udf.Custom.rankFunc';

...
CREATE TEMPORARY VIEW MyTable AS (
    SELECT
       rank(id)  AS rank,
       ...            

In Scala code (mylib.jar):

package com.mycompany.udf

...

object Custom {
    def rankFunc(id: Long): Double = { Rank(id).rank }
    ....
}

However, Hive code does not see this function.

18/01/23 17:38:25 ERROR SparkSQLDriver: Failed in [
CREATE TEMPORARY FUNCTION rank AS 'com.mycompany.udf.Custom.rankFunc']
java.lang.ClassNotFoundException: com.mycompany.udf.Custom.rankFunc
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)

How should I change the code in my Scala library?

2 Answers 2

3
+100

You're getting this error because Hive expects a function to be a class, not a method name.

Change your Scala code (UDF) to:

package com.mycompany.udf

class RankFunc extends org.apache.hadoop.hive.ql.exec.UDF {
  def evaluate(id: Long): Double = { Rank(id).rank }
}

... and SQL script to:

CREATE TEMPORARY FUNCTION rankFunc AS 'com.mycompany.udf.RankFunc'
...

Here are examples of how to create a custom UDF with Java and Scala.

Sign up to request clarification or add additional context in comments.

Comments

0

Because there is lots of confusion I am updating my answer:

here is the code for md5 jave code:

package org.apache.hadoop.hive.ql.udf;
import org.apache.commons.codec.digest.DigestUtils;
import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.BytesWritable;
import org.apache.hadoop.io.Text;

public class UDFMd5 extends UDF {
  private final Text result = new Text();
  /**
   * Convert String to md5
   */
  public Text evaluate(Text n) {
    if (n == null) {
      return null;
    }
    String str = n.toString();
    String md5Hex = DigestUtils.md5Hex(str);
    result.set(md5Hex);
    return result;
  }  
}

I have taken the same jar used in Hive and was able to make it work :

enter image description here

AND This worked for me :

enter image description here

In hive i used :

create temporary function md5 AS 'org.apache.hadoop.hive.ql.udf.UDFMd5' USING JAR '/test/balaram/hive-MD5.jar;

In Spark I used :

create temporary function md5 AS 'org.apache.hadoop.hive.ql.udf.UDFMd5'

If This doesn't help, I am sorry

5 Comments

yeah, I know how to do that with Spark. My question is about spark-sql.
You need to include the jar when starting your Spark SQL. You can use like below : CREATE TEMPORARY FUNCTION myFunnyUpper AS 'org.hue.udf.MyUpper'
Right. That's what I did in the code above. Also, I used a jar - updated in the question. But it does not work.
I have edited above post to show how I was able to make it work.
Could you please clarify? Where do you run sqlContext.udf.register("customudf", ..) ? I don't have main() in my jar.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.