Can I return a Tuple2 from an apache spark UDF (in java)?

Question

I need a UDF2 that takes two arguments as input corresponding to two Dataframe columns of types String and mllib.linalg.Vector and return a Tuple2. IS this doable? if yes, how do I register this udf()?

hiveContext.udf().register("getItemData", get_item_data, WHAT GOES HERE FOR RETURN TYPE?);

the udf is defined as follows:

UDF2<String, org.apache.spark.mllib.linalg.Vector, Tuple2<String, org.apache.spark.mllib.linalg.Vector>> get_item_data =
            (String id, org.apache.spark.mllib.linalg.Vector features) -> {
        return new Tuple2<>(id, features);
    };

zero323 · Accepted Answer · 2017-01-09 22:41:28Z

2

There goes a schema which can be defined as follows:

import org.apache.spark.sql.types.DataType;
import org.apache.spark.sql.types.DataTypes;
import org.apache.spark.mllib.linalg.VectorUDT;

List<StructField> fields = new ArrayList<>();
fields.add(DataTypes.createStructField("id", DataTypes.StringType, false));
fields.add(DataTypes.createStructField("features", new VectorUDT(), false));
DataType schema = DataTypes.createStructType(fields);

but if all you need is just a struct without any additional processing org.apache.spark.sql.functions.struct should do the trick:

df.select(struct(col("id"), col("features"));

edited Jan 9, 2017 at 22:41

answered Jan 9, 2017 at 21:37

zero323

331k108 gold badges982 silver badges958 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Can I return a Tuple2 from an apache spark UDF (in java)?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related