1

I have been trying to make a UDF in BigQuery to compress multiple rows into a single row

CREATE OR REPLACE FUNCTION function_name(to_compress_column INT64, order_by_column INT64) AS (
    TO_JSON_STRING(
           ARRAY_AGG(
                IFNULL(to_compress_column,-1) RESPECT NULLS
                ORDER BY order_by_column
                    )
                  )
    );

But this causes the error "ARRAY_AGG not allowed in SQL function body". I tried using SELECT and UNNEST but since my input is not an ARRAY, it did not work. How should I proceed?

Edit:

My input tables have this format:

visits place_id date hour
23 abc123 2022-01-01 4
20 abc123 2022-01-01 2
19 abc123 2022-01-01 3
24 abc123 2022-01-01 1
26 abc123 2022-01-01 5
18 abc456 2022-01-01 2
20 abc456 2022-01-01 3
17 abc456 2022-01-01 1

I want my output tables to have this format:

visits place_id date
[24,20,19,23,26] abc123 2022-01-01
[17,18,20] abc456 2022-01-01

I understand that I can do this by

SELECT TO_JSON_STRING(ARRAY_AGG(IFNULL(visits,-1) RESPECT NULLS
  ORDER BY hour)) visits, 
  place_id, 
  date 
  from input_table
  group by place_id, date

But in order to avoid repetition of the first line of the select query as there a lot of tables I have to do this in, I want to create a UDF so that the query changes to:

SELECT function_name(visits,hour) visits, 
      place_id, 
      date 
      from input_table
      group by place_id, date
3
  • 1
    What is the XY problem? - I think yo are in that trap Commented Sep 2, 2022 at 16:59
  • @MikhailBerlyant I see, I have edited my question to add perspective on what I am trying to achieve through the function approach Commented Sep 2, 2022 at 17:40
  • 1
    what you want is actually called aggregate UDF and BigQuery does not support such (at least yet)! so udf direction looks like dead end here . will think more on this ... Commented Sep 2, 2022 at 21:15

1 Answer 1

2

what you want is actually called aggregate UDF and BigQuery does not support such (at least yet)! so aggregate udf direction looks like dead end here

What I see as an option is described in below example

create temp function func_name (arr any type) as (( 
  select to_json_string(array_agg(ifnull(visits,-1) respect nulls order by hour))
  from unnest(arr) 
));
select func_name(array_agg(struct(visits, hour))) visits, 
  place_id, 
  date 
  from input_table
  group by place_id, date      

if applied to sample data in your question - output is

enter image description here

So, it is not 100% of what you wanted , but very close in my mind
So, instead of desired line

function_name(visits,hour) visits,    

you will have to use

function_name(array_agg(struct(visits, hour))) visits,

I feel like this is reasonable trade-off :o)

P.S. And, obviously, instead of temp function you will create permanent function ...

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.