4

Is it possible to create customized aggregation UDF function in Redshift? If yes, where can I find the tutorial or documentation of that?

my data looks like

A     B     time_series

a1    b1    "[1,2,3]"
a1    b2    "[2,3,4]"
a2    b1    "[2,2,2]"

I want to groupby A or B and get the average time series.

for example, group by A

a1   "[1.5, 2.5, 3.5]"
a2   "[2,2,2]"

1 Answer 1

1

As of today UDF can only be applied on a single row. So to achieve what you want you need to pre-combine values in single row and then apply UDF to do the math.

For example:

Create UDF:

CREATE FUNCTION f_mean(time_series VARCHAR)
RETURNS varchar
IMMUTABLE AS $$
import json
data = [json.loads(x.replace('"', '')) for x in time_series.split('""')]
return json.dumps([sum(e)/float(len(e)) for e in zip(*data)])
$$ LANGUAGE plpythonu;

Use LISTAGG function to combine values into single row. And then appy UDF.

mydb=> select A, f_mean(listagg(time_series)) within group (order by A) from my_table group by A;
 a  |     f_mean      
----+-----------------
 a2 | [2.0, 2.0, 2.0]
 a1 | [1.5, 2.5, 3.5]
(2 rows)
Sign up to request clarification or add additional context in comments.

1 Comment

It has a limitation then, the length of time_series could never exceed the maximal allowed length of listagg

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.