0

I have a dataset that looks as follows and I am using Redshift SQL:

CREATE TABLE mytable
(
    userid       INTEGER  NOT NULL PRIMARY KEY,
    username     VARCHAR(17) NOT NULL,
    display_name VARCHAR(13) NOT NULL,
    bio          VARCHAR(51) NOT NULL,
    places       VARCHAR(24),
    email        VARCHAR(25) NOT NULL
);

INSERT INTO mytable (userid, username, display_name, bio, places, email)
VALUES (123, 'cliff.park', 'Cliff Park', 'Student living in Chicago. Born in Phoenix', '[''Chicago'', ''Phoenix'']', '[email protected]');
INSERT INTO mytable (userid, username, display_name, bio, places, email)
VALUES (456, 'sam2234', 'Sam Wright', 'Current Location: Cleveland. Next Location: Orlando', '[''Cleveland'', ''Orlando'']', '[email protected]');
INSERT INTO mytable (userid, username, display_name, bio, places, email)
VALUES (789, 'buckeyes33', 'BuckeyeFan', 'From Columbus… Go Bucks!', '[''Columbus'']', '[email protected]');
INSERT INTO mytable (userid, username, display_name, bio, places, email)
VALUES (1011, 'sarah.patrick4354', 'Sarah Patrick', 'Checkout my clothing line!!!!', '[]', '[email protected]');

What I'm trying to do: Whenever the places field contains multiple selections (for example: ['Chicago', 'Phoenix']) it will create a new row with all of the same fields and data, except for places, which will now only have one option. So the final output should look something like this:

enter image description here

Additionally, it would get rid of the [] and quote string characters so that ['Columbus'] would just be Columbus and any value that is just [] would just be blank/null/empty

1
  • @marc_s: Thank you for editing! Do you by chance have an answer? Commented Sep 21, 2022 at 19:59

1 Answer 1

2

I answered a nearly exact same question a few days ago. How split comma separated string into multiple rows in AWS redshift?

Take a look and see if this gets you unstuck. If not please comment with the issue you have and I'll be happy to address.

===========================================================

As requested I've put together a few ways to attack this:

First following the pattern I used for the previous answer and using no special data types. I did change your single quotes (chr(39)) in your lists to double quotes as this is the JSON standard and this opened up the power of JSON functions.

with recursive numbers(n) as (
  select 0 as n
  union all
  select n + 1 
  from numbers n
  where n < (select max(length(places) - length(replace(places, ',',''))) from mytable)
),
input as (
  select userid, username, display_name, bio, replace(places,chr(39),'"') places, email, 
  length(places) - length(replace(places, ',','')) no_of_elements --counts the number of commas in the string
  from mytable
)
select userid, username, display_name, bio, places, email, json_extract_array_element_text(places, n.n) as place
from input t
join numbers n
on n.n <= t.no_of_elements
order by userid, place;

Next I took your JSON info and converted it to super data type (json_parse). However I kept the same explicit unrolling behavior (numbers CTE) to break things out. This can be easier to understand that the built-in unrolling of supers. Note that since zero elements and 1 element need to be treated the same (join once) I have a decode() in the join on condition.

with recursive numbers(n) as (
  select 0 as n
  union all
  select n + 1 
  from numbers n
  where n < (select max(get_array_length(json_parse(replace(places,chr(39),'"')))) from mytable)
),
input as (
select userid, username, display_name, bio, json_parse(replace(places,chr(39),'"')) as places, email
from mytable
)
select t.*, places[0] as place
from input t
join numbers n
on n.n < decode(get_array_length(t.places),0,1,get_array_length(t.places))
order by userid, place;

Now if you are ready to go all in on supers this can be done with a little trickery and built-in unrolling. Again some action was needed for the case of zero elements in the array - I put an empty string as an element in this case. There are other ways this could be done but I was already modifying the string to make it valid JSON.

with input as (
select userid, username, display_name, bio, json_parse(replace(decode(places,'[]','[""]',places),chr(39),'"')) as places, email
from mytable
) 
select t.*,  place
from input t, t.places as place
order by userid, place;

If you like using supers you may want to change your data column to be super instead of varchar but this will mean changing your data be valid JSON on ingestion.

Sign up to request clarification or add additional context in comments.

3 Comments

Thank you. This isn't quite what I'm looking for though because of some differences and also some additional data cleanup stuff, so if you have time to answer based on the data above, that would be extremely helpful!
Appreciate the help BTW!
Several ways to skin this cat added to the answer

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.