8

For Spark dataframe via pyspark, we can use pyspark.sql.functions.udf to create a user defined function (UDF).

I wonder if I can use any function from Python packages in udf(), e.g., np.random.normal from numpy?

1 Answer 1

13

Assuming you want to add a column named new to your DataFrame df constructed by calling numpy.random.normal repeatedly, you could do:

import numpy
from pyspark.sql.functions import UserDefinedFunction
from pyspark.sql.types import DoubleType

udf = UserDefinedFunction(numpy.random.normal, DoubleType())

df_with_new_column = df.withColumn('new', udf())
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.