0

Question: How to parse out each U1 value in a separate column using Big Query and column name should be u variable example u1, u2 etc.

Here is the string:

dc_pre=CKX1m_vLoOUCFZUAXAodcgQKfw;gtm=2oda21;auiddc=*;u1=en;u10=undefined;u11=undefined;u12=undefined;u13=undefined;u14=undefined;u15=undefined;u16=undefined;u17=undefined;u18=undefined;u19=undefined;u2=bsd;u20=undefined;u21=undefined;u3=undefined;u4=us;u5=undefined;u6=undefined;u7=undefined;u8=undefined;u9=undefined;~oref=https://localhost.dell.com/premier/us/en/rc1295291/

2
  • 1
    Is the total number of possible u* fixed at 21 as in the example string? Commented Nov 6, 2019 at 20:10
  • Yes the total number of u is fixed to 21 Commented Nov 6, 2019 at 20:12

3 Answers 3

2

You'll have to do this in three steps, but can be done in one query:

first. split the entry:

SELECT 
   SPLIT(<<FIELD>>, ';')[OFFSET(0)] as dc_pre
  ,SPLIT(<<FIELD>>, ';')[OFFSET(1)] as dc_gtm
  ,SPLIT(<<FIELD>>, ';')[OFFSET(2)] as dc_gtm
  ...
from `database.dataset.table`

For each split, replace the text:

SELECT 
   EREPLACE(SPLIT(<<FIELD>>, ';')[OFFSET(0)], 'dc_pre=', '') as dc_pre
  ,EREPLACE(SPLIT(<<FIELD>>, ';')[OFFSET(1)], 'dc_gtm=', '') as dc_gtm
  ,EREPLACE(SPLIT(<<FIELD>>, ';')[OFFSET(2)], 'auiddc=', '') as auiddc
  ...
from `database.dataset.table`

Then you can convert undefineds to NULL if you like:

SELECT 
   EREPLACE(SPLIT(<<FIELD>>, ';')[OFFSET(0)], 'dc_pre=', '') as dc_pre
  ,EREPLACE(SPLIT(<<FIELD>>, ';')[OFFSET(1)], 'dc_gtm=', '') as dc_gtm
  ,CASE EREPLACE(SPLIT(<<FIELD>>, ';')[OFFSET(2)], 'auiddc=', '') WHEN 'undefined' then NULL else EREPLACE(SPLIT(<<FIELD>>, ';')[OFFSET(2)], 'auiddc=', '') end as auiddc
  ...
from `database.dataset.table`

I did this by memory, but as you can see the third example, is all three steps together. Hope this helps.

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you! The final query does not necessary pass out all the u variables correctly.
@Pranav If it's working please upvote the answer and accept it as correct.
2

Below is for BigQuery Standard SQL

#standardSQL
SELECT * EXCEPT(line), 
  REGEXP_EXTRACT(line, r';u1=(\w+);') AS u1,
  REGEXP_EXTRACT(line, r';u2=(\w+);') AS u2,
  REGEXP_EXTRACT(line, r';u3=(\w+);') AS u3,
  REGEXP_EXTRACT(line, r';u4=(\w+);') AS u4,
  REGEXP_EXTRACT(line, r';u5=(\w+);') AS u5,
  REGEXP_EXTRACT(line, r';u6=(\w+);') AS u6,
  REGEXP_EXTRACT(line, r';u7=(\w+);') AS u7,
  REGEXP_EXTRACT(line, r';u8=(\w+);') AS u8,
  REGEXP_EXTRACT(line, r';u9=(\w+);') AS u9,
  REGEXP_EXTRACT(line, r';u10=(\w+);') AS u10,
  REGEXP_EXTRACT(line, r';u11=(\w+);') AS u11,
  REGEXP_EXTRACT(line, r';u12=(\w+);') AS u12,
  REGEXP_EXTRACT(line, r';u13=(\w+);') AS u13,
  REGEXP_EXTRACT(line, r';u14=(\w+);') AS u14,
  REGEXP_EXTRACT(line, r';u15=(\w+);') AS u15,
  REGEXP_EXTRACT(line, r';u16=(\w+);') AS u16,
  REGEXP_EXTRACT(line, r';u17=(\w+);') AS u17,
  REGEXP_EXTRACT(line, r';u18=(\w+);') AS u18,
  REGEXP_EXTRACT(line, r';u19=(\w+);') AS u19,
  REGEXP_EXTRACT(line, r';u20=(\w+);') AS u20,
  REGEXP_EXTRACT(line, r';u21=(\w+);') AS u21
FROM `project.dataset.table`

you can test, play with above using dummy data as in below example

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 1 id, 'abc' other_cols, 'dc_pre=CKX1m_vLoOUCFZUAXAodcgQKfw;gtm=2oda21;auiddc=*;u1=en;u10=undefined;u11=undefined;u12=undefined;u13=undefined;u14=undefined;u15=undefined;u16=undefined;u17=undefined;u18=undefined;u19=undefined;u2=bsd;u20=undefined;u21=undefined;u3=undefined;u4=us;u5=undefined;u6=undefined;u7=undefined;u8=undefined;u9=undefined;~oref=https://localhost.dell.com/premier/us/en/rc1295291/' line UNION ALL
  SELECT 2, 'xyz', 'dc_pre=CKX1m_vLoOUCFZUAXAodcgQKfw;gtm=2oda21;auiddc=*;u1=en;u10=a;u11=b;u12=123;u13=undefined;u14=undefined;u15=undefined;u16=undefined;u17=undefined;u18=undefined;u19=undefined;u2=bsd;u20=undefined;u21=undefined;u3=undefined;u4=us;u5=undefined;u6=undefined;u7=undefined;u8=undefined;u9=undefined;~oref=https://localhost.dell.com/premier/us/en/rc1295291/' 
)
SELECT * EXCEPT(line), 
  REGEXP_EXTRACT(line, r';u1=(\w+);') AS u1,
  REGEXP_EXTRACT(line, r';u2=(\w+);') AS u2,
  REGEXP_EXTRACT(line, r';u3=(\w+);') AS u3,
  REGEXP_EXTRACT(line, r';u4=(\w+);') AS u4,
  REGEXP_EXTRACT(line, r';u5=(\w+);') AS u5,
  REGEXP_EXTRACT(line, r';u6=(\w+);') AS u6,
  REGEXP_EXTRACT(line, r';u7=(\w+);') AS u7,
  REGEXP_EXTRACT(line, r';u8=(\w+);') AS u8,
  REGEXP_EXTRACT(line, r';u9=(\w+);') AS u9,
  REGEXP_EXTRACT(line, r';u10=(\w+);') AS u10,
  REGEXP_EXTRACT(line, r';u11=(\w+);') AS u11,
  REGEXP_EXTRACT(line, r';u12=(\w+);') AS u12,
  REGEXP_EXTRACT(line, r';u13=(\w+);') AS u13,
  REGEXP_EXTRACT(line, r';u14=(\w+);') AS u14,
  REGEXP_EXTRACT(line, r';u15=(\w+);') AS u15,
  REGEXP_EXTRACT(line, r';u16=(\w+);') AS u16,
  REGEXP_EXTRACT(line, r';u17=(\w+);') AS u17,
  REGEXP_EXTRACT(line, r';u18=(\w+);') AS u18,
  REGEXP_EXTRACT(line, r';u19=(\w+);') AS u19,
  REGEXP_EXTRACT(line, r';u20=(\w+);') AS u20,
  REGEXP_EXTRACT(line, r';u21=(\w+);') AS u21
FROM `project.dataset.table`    

with output

Row id  other_cols  u1  u2  u3  u4  u5  u6  u7  u8  u9  u10 u11 u12 u13 u14 u15 u16 u17 u18 u19 u20 u21  
1   1   abc en  bsd undefined   us  undefined   undefined   undefined   undefined   undefined   undefined   undefined   undefined   undefined   undefined   undefined   undefined   undefined   undefined   undefined   undefined   undefined    
2   2   xyz en  bsd undefined   us  undefined   undefined   undefined   undefined   undefined   a   b   123 undefined   undefined   undefined   undefined   undefined   undefined   undefined   undefined   undefined    

1 Comment

@Pranav - so now - you should accept one answer out of many given - the one you think best fit into what you expected to get :o)
1

Another approach you could use for this, if the data will be a predictable format, is to convert it to a JSON string then use JSON functions to extract the values.

For example:

WITH base_data AS 
(
   SELECT 
      CONCAT('{"', REPLACE(REPLACE(REPLACE('dc_pre=CKX1m_vLoOUCFZUAXAodcgQKfw;gtm=2oda21;auiddc=*;u1=en;u10=undefined;u11=undefined;u12=undefined;u13=undefined;u14=undefined;u15=undefined;u16=undefined;u17=undefined;u18=undefined;u19=undefined;u2=bsd;u20=undefined;u21=undefined;u3=undefined;u4=us;u5=undefined;u6=undefined;u7=undefined;u8=undefined;u9=undefined;~oref=https://localhost.dell.com/premier/us/en/rc1295291/', '=', '": "'), ';', '", "'), '~', ''), '"}') AS bd
) 
    SELECT 
      json_extract_scalar(bd, '$.dc_pre') as dc_pre
    , json_extract_scalar(bd, '$.gtm') as gtm
    , json_extract_scalar(bd, '$.auiddc') as auiddc
    , json_extract_scalar(bd, '$.u1') as u1
    , json_extract_scalar(bd, '$.u2') as u2
    , json_extract_scalar(bd, '$.u3') as u3
    -- etc...
    , json_extract_scalar(bd, '$.oref') as oref
    FROM base_data

Just replace the dc_pre=CK... string in the first statement with your actual column that contains the data. This has the advantage of being a single step process.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.