0

I have a usecase where the i need to create a vectorized UDTF on a pandas dataframe. This dataframe can have different columns from time to time as it is preprocessed data.

I was looking at the example in https://docs.snowflake.com/en/developer-guide/udf/python/udf-python-tabular-vectorized#example-calculate-the-summary-statistic-for-each-column-in-the-partition

Here the the input variables are explicitly mentioned as summary_stats(id varchar, col1 float, col2 float, col3 float, col4 float, col5 float)

The question is, is there way to handle this situation where the input dataframe has variable number of columns and also can have of different datatypes? How can the same function above be modified when the input columns are not known before hand?

Thanks in advance.

1 Answer 1

1

UDTF and Vectorized UDTF requires input columns, output columns, and type-hints. Snowflake needs to pull data in and out out of the run-time (Python Interpreter) via SQL through its database as such it will need the columns and how output should be mapped.

You may refer to this mapping for Python -> Snowflake typing. https://docs.snowflake.com/en/developer-guide/udf/python/udf-python-batch#label-udf-python-batch-types

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.