2

Here is my hive table

CREATE TABLE `dum`(`val` map<string,array<string>>);
insert into dum select map('A',array('1','2','3'),'B',array('4','5','6'));

and here is how it looks

select * from dum;
{"A":["1","2","3"],"B":["4","5","6"]}

I am trying to create a simple UDF that can combine all the items in the values of the above map into a list. Here is what i want to see

select modudf(val) from dum;
["1","2","3","4","5","6"]

so i created

package some.package;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.hive.ql.udf.UDFType;
import java.util.ArrayList;

import java.util.List;
import java.util.Map;

@UDFType(deterministic = true)
public class CustomUDF extends UDF {

public List<String> evaluate(Map<String, String[]> inMap) {

            ArrayList<String> res = new ArrayList<String>();
                for(Map.Entry<String, String[]> ent : inMap.entrySet()){
                    for(String item : ent.getValue())
                        res.add(item);
            }
        return res;
   }
}

but when i try to invoke it as

add jar /path/to/my/jar;
CREATE TEMPORARY FUNCTION modudf AS 'some.package.CustomUDF';
select modudf(val) from dum;

i get

FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments 'val': No matching method for class some.package.CustomUDF with (map<string,array<string>>). Possible choices: _FUNC_(map<struct<>,struct<>>)

why does hive think that my UDF requires map<struct<>,struct<>> instead of map<string,array<string>> ? I even tried replacing String with Charsequence but i got the same error

Note that as per the documentation

https://hive.apache.org/javadocs/r1.2.2/api/org/apache/hadoop/hive/ql/exec/UDF.html

I should be able to use collections as input to the evaluate method

What am i doing wrong ?

Update

I also tried the following definition

public List<CharSequence> evaluate(Map<CharSequence, List<CharSequence>> inMap) {

        modLogger.info(inMap);
            ArrayList<CharSequence> res = new ArrayList<CharSequence>();
                for(Map.Entry<CharSequence, List<CharSequence>> ent : inMap.entrySet()){
                    for(CharSequence item : ent.getValue())
                        res.add(item);
            }
        return res;
   }
}

but i still get

hive> add jar /path/to/my/jar;
Added [/path/to/my/jar] to class path
Added resources: [/path/to/my/jar]
hive> CREATE TEMPORARY FUNCTION modudf AS 'some.package.CustomUDF';
hive> desc dum;
OK
val                     map<string,array<string>>
Time taken: 0.094 seconds, Fetched: 1 row(s)
hive> select val from dum;
Query ID = root_20200629170147_80b5248f-4519-4dae-a070-3c5185f742ea
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1593449512239_0001)

----------------------------------------------------------------------------------------------
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container     SUCCEEDED      1          1        0        0       0       0
----------------------------------------------------------------------------------------------
VERTICES: 01/01  [==========================>>] 100%  ELAPSED TIME: 6.12 s
----------------------------------------------------------------------------------------------
OK
{"A":["1","2","3"],"B":["4","5","6"]}
Time taken: 10.631 seconds, Fetched: 1 row(s)
hive> select modudf(val) from dum;
FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments 'val': No matching method for class com.walmart.labs.search.sib.gcp.ModularTransformUDF with (map<string,array<string>>). Possible choices: _FUNC_(map<struct<>,array<struct<>>>)

1 Answer 1

2
+50

See the citation from the link you've sent:

Note that Hive Arrays are represented as Lists in Hive. So an ARRAY column would be passed in as a List.

So you should have evaluate(Map<String, List<String>> inMap) signature instead of evaluate(Map<String, String[]> inMap).

Sign up to request clarification or add additional context in comments.

3 Comments

thanks for the suggestion. Please see the update. i still get the same error
did you try having exactly this signature: public List<String> evaluate(Map<String, List<String>> inMap) ? Because changing String to CharSequence makes no sense to me
hey, that worked! i used Charsequence due to some other code dependency but i dont have to. thanks a lot

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.