0

I have a snowflake array as below rows which is an input, which I would want to check for each value in the array value and spit as multiple output arrays based on the value's length for values with 5 digits as one column, and values with 6 digits as another column.

ID_COL,ARRAY_COL_VALUE
1,[22,333,666666]
2,[1,55555,999999999]
3,[22,444]

Output table:

ID_COL,FIVE_DIGIT_COL,SIX_DIGIT_COL
1,[],[666666]
2,[555555],[]
3,[],[]

Please let me know if we could iterate through each array value and perform SQL aggregation to check column length and then output as a separate column outputs. Creating it through SQL would be great, but UDFs using javascript, python if an option would also be great.

2
  • Can you have multiple values with same length in the array? Commented Feb 10, 2023 at 19:36
  • Yes, sorry I didn't provide a sample. But that is the expectation why it is an array in output column. Commented Feb 11, 2023 at 21:36

2 Answers 2

1

Using SQL and FLATTEN:

CREATE OR REPLACE TABLE t(ID_COL INT,ARRAY_COL_VALUE VARIANT)
AS
SELECT 1,[22,333,666666] UNION ALL
SELECT 2,[1,55555,999999999] UNION ALL
SELECT 3,[22,444];

Query:

SELECT ID_COL, 
  ARRAY_AGG(CASE WHEN s.value BETWEEN 10000 AND 99999 THEN s.value END) AS FIVE_DIGIT_COL,
  ARRAY_AGG(CASE WHEN s.value BETWEEN 100000 AND 999999 THEN s.value END) AS SIX_DIGIT_COL
FROM t, TABLE(FLATTEN(ARRAY_COL_VALUE)) AS s
GROUP BY ID_COL;

And Python UDF:

create or replace function filter_arr(arr variant, num_digits INT)
returns variant
language python
runtime_version = 3.8
handler = 'main'
as $$
def main(arr, num_digits):
    return [x for x in arr if len(str(x))==num_digits]
$$;

SELECT ID_COL, 
      ARRAY_COL_VALUE,
      filter_arr(ARRAY_COL_VALUE, 5),
      filter_arr(ARRAY_COL_VALUE, 6)
FROM t;

Output:

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

0

If you're dealing strictly with numbers here's another way

with cte (id, array_col) as
(select 1,[22,333,666666,666666] union all
 select 2,[1,22222,55555,999999999] union all
 select 3,[22,444])

select *, 
       concat(',',array_to_string(array_col,',,'),',') as str_col,
       regexp_substr_all(str_col,',([^,]{5}),',1,1,'e') as len_5,
       regexp_substr_all(str_col,',([^,]{6}),',1,1,'e') as len_6
from cte;

The basic idea is to turn that array into a string and keep all the digits surrounded by , so that we can parse the pattern using regex_substr_all.

If you're dealing with strings, you can modify it to use a delimiter that won't show up in your data.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.