I need a UDF2 that takes two arguments as input corresponding to two Dataframe columns of types String and mllib.linalg.Vector and return a Tuple2. IS this doable? if yes, how do I register this udf()?
hiveContext.udf().register("getItemData", get_item_data, WHAT GOES HERE FOR RETURN TYPE?);
the udf is defined as follows:
UDF2<String, org.apache.spark.mllib.linalg.Vector, Tuple2<String, org.apache.spark.mllib.linalg.Vector>> get_item_data =
(String id, org.apache.spark.mllib.linalg.Vector features) -> {
return new Tuple2<>(id, features);
};