How to create customized aggregation UDF function in Redshift?

Question

Is it possible to create customized aggregation UDF function in Redshift? If yes, where can I find the tutorial or documentation of that?

my data looks like

A     B     time_series

a1    b1    "[1,2,3]"
a1    b2    "[2,3,4]"
a2    b1    "[2,2,2]"

I want to groupby A or B and get the average time series.

for example, group by A

a1   "[1.5, 2.5, 3.5]"
a2   "[2,2,2]"

Vor · Accepted Answer · 2016-01-28 16:28:55Z

1

As of today UDF can only be applied on a single row. So to achieve what you want you need to pre-combine values in single row and then apply UDF to do the math.

For example:

Create UDF:

CREATE FUNCTION f_mean(time_series VARCHAR)
RETURNS varchar
IMMUTABLE AS $$
import json
data = [json.loads(x.replace('"', '')) for x in time_series.split('""')]
return json.dumps([sum(e)/float(len(e)) for e in zip(*data)])
$$ LANGUAGE plpythonu;

Use LISTAGG function to combine values into single row. And then appy UDF.

mydb=> select A, f_mean(listagg(time_series)) within group (order by A) from my_table group by A;
 a  |     f_mean      
----+-----------------
 a2 | [2.0, 2.0, 2.0]
 a1 | [1.5, 2.5, 3.5]
(2 rows)

answered Jan 28, 2016 at 16:28

Vor

35.6k47 gold badges142 silver badges196 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Hello lad Over a year ago

It has a limitation then, the length of time_series could never exceed the maximal allowed length of listagg

Collectives™ on Stack Overflow

How to create customized aggregation UDF function in Redshift?

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related