3

I have a column like this:

String_to_Extract
A~S1_B~S2_C~S11
A~S1_B~S3_C~S12
C~S13_A~S11_B~S4

The part before the "~" should be the column name. The part after the "~" should be the row value. This is separated by a "_" . Therefore, the result should look like this:

String_to_Extract A B C
A~S1_B~S2_C~S11 S1 S2 S11
A~S1_B~S3_C~S12 S1 S3 S12
C~S13_A~S11_B~S4 S11 S4 S13

Here is my approach:

SELECT
String_to_Extract,
SUBSTRING(String_to_Extract, INSTR(Advertiser, "A~")+2, ?) AS A,
SUBSTRING(String_to_Extract, INSTR(Advertiser, "B~")+2, ?) AS B,
SUBSTRING(String_to_Extract, INSTR(Advertiser, "C~")+2, ?) AS C,
From Table

How do I get the part between the ~ and next _ for each column?

Would be glad about help!

1
  • 1
    Your expected output for C~S13_A~S1_B~S4 appears to be off in your question. Commented Dec 4, 2021 at 10:14

3 Answers 3

2

One approach uses REGEXP_EXTRACT:

SELECT
    REGEXP_EXTRACT(String_to_Extract, r"(?:^|_)A~([^_]+)") AS A,
    REGEXP_EXTRACT(String_to_Extract, r"(?:^|_)B~([^_]+)") AS B,
    REGEXP_EXTRACT(String_to_Extract, r"(?:^|_)C~([^~]+)") AS C
FROM yourTable;
Sign up to request clarification or add additional context in comments.

4 Comments

Thank you very much. However, the order of the parameters could change or more could be added. Therefore, it would be useful to first search for the respective identifiers (A~, B~, C~, D~, .....) I guess.
I get it. I have updated my answer.
One more question on this. My delimiter now changed from ~ to |. So the string looks like: A|S1_B|S2_C|S11. The problem is that it will be identified as an OR operator. How can I handle this?
@MichaHein You need to escape the pipe, e.g. for the A value use: (?:^|_)A\|([^_]+)
0

Consider below approach (BigQuery)

select * from (
  select String_to_Extract, col_val[offset(0)] as col, col_val[offset(1)] as val
  from your_table, unnest(split(String_to_Extract, '_')) kv,
  unnest([struct(split(kv, '~') as col_val)])
)
pivot (any_value(val) for col in ('A', 'B', 'C'))   

If applied to sample data in your question - output is

enter image description here

Comments

0

You can also use this approach which orders the splitted item first and then picks the values:


select 
   split(ordered[safe_offset(0)], '~')[safe_offset(1)] as A,
   split(ordered[safe_offset(1)], '~')[safe_offset(1)] as B,
   split(ordered[safe_offset(2)], '~')[safe_offset(1)] as C
 from (
    select 
        array(select _ from unnest(split(Advertiser, '_') ) as _ order by 1) as ordered
    from dataset.table
)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.