Snowflake : vectorized Python UDTFs with pandas dataframe as input with variable number of columns

Question

I have a usecase where the i need to create a vectorized UDTF on a pandas dataframe. This dataframe can have different columns from time to time as it is preprocessed data.

I was looking at the example in https://docs.snowflake.com/en/developer-guide/udf/python/udf-python-tabular-vectorized#example-calculate-the-summary-statistic-for-each-column-in-the-partition

Here the the input variables are explicitly mentioned as summary_stats(id varchar, col1 float, col2 float, col3 float, col4 float, col5 float)

The question is, is there way to handle this situation where the input dataframe has variable number of columns and also can have of different datatypes? How can the same function above be modified when the input columns are not known before hand?

Thanks in advance.

user1275127 · Accepted Answer · 2023-12-12 01:20:09Z

1

UDTF and Vectorized UDTF requires input columns, output columns, and type-hints. Snowflake needs to pull data in and out out of the run-time (Python Interpreter) via SQL through its database as such it will need the columns and how output should be mapped.

You may refer to this mapping for Python -> Snowflake typing. https://docs.snowflake.com/en/developer-guide/udf/python/udf-python-batch#label-udf-python-batch-types

answered Dec 12, 2023 at 1:20

user1275127

113 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Snowflake : vectorized Python UDTFs with pandas dataframe as input with variable number of columns

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related