2

I have some weblog data in big query which I need to transform to make it easier to use and query. The data looks like:

Big_Query_Table

I want to extract and transform the data within the curled brackets after Results{…..} (colored blue). The data is of the form ‘(\d+((PQ)|(KL))+\d+)’ and there can be 1-20+ entries in the result array. I am only interested in the first 16 entries.

I have been able to extract the data within curled brackets into a new column, using Substr and regext_extract. But I'm unable to SPLIT it into columns (sometimes there is only 1 result and so the delimiter "," is missing. I'm new with regex, may be I can use something like ‘(\d+((PQ)|(KL))+\d+){1}’ etc. to split the data into multiple columns and then pivot it.

Ideal output in my case would be to transform it into something like:

Ouput Table

In the above solution, each row in original table is repeated from 1-16 times depending on the number of items in the Results array.

I’m not completely sure if it’s possible to do this in big query. I’ll be grateful if anyone can help me out a little here.

If this is not possible, then I can have 16 rows for every event with NULL values in Event_details for cases where there are less than 16 entries in result array.

In case both of these are not possible, the last solution would be to have it transformed into something like: back_up_output_table

The reason I want to transform the data is that in most of the cases I would need to find which result array items are appearing and in what order.

0

2 Answers 2

2

Check this out: Split string into multiple columns with bigquery. In their case its delimited by spaces. replace the \s with ','

something like:

SELECT  
Regexp_extract(StringToParse,r'^*{(?:[^,]*,){0}(\d+(?:(?:PQ)|(?:KL))+\d+)\s?') as Word0,
Regexp_extract(StringToParse,r'^*{(?:[^,]*,){1}(\d+(?:(?:PQ)|(?:KL))+\d+)\s?') as Word1,
Regexp_extract(StringToParse,r'^*{(?:[^,]*,){2}(\d+(?:(?:PQ)|(?:KL))+\d+)\s?') as Word2,
Regexp_extract(StringToParse,r'^*{(?:[^,]*,){3}(\d+(?:(?:PQ)|(?:KL))+\d+)\s?') as Word3,
FROM
(SELECT 'bla{1234PQ5,6789KL0,1234PQ5,6789KL0,123' as StringToParse)
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the answer N.N. This helps me break the words into multiple columns. The only issue is with Word0 - its not being captured in this solution and is always coming as Null. Word0 is between "{" and a comma or }. I tried changing the code to: Regexp_extract(StringToParse,r'^(?:[^{]*,){0}(\d+(?:(?:AUCN)|(?:TRNL))+\d+),\s?') as Word0 but it didn't work
Also, can you suggest a way to transform this into the first solution that I had suggested. I was keeping this solution as my last option, since my first preference would be pivoting the data...so that the different items in results come into different rows
1

Use SPLIT()

BQ split documentation.

SELECT Event_ID, Event_UserID, Event_SessionID, Keyword,
SPLIT(REGEXP_EXTRACT(Event_details,"Results\{(.*)\}"),",") as Event_details_item
FROM mydata.mytable

2 Comments

Thanks for the answer David...I tried using SPLIT as well...but I'm somehow getting only the first item in the result array. There can be 1 or more items in the result array and I want to capture upto 16 items.
It seems to work: SELECT EventID,UserID ,SessionID,KeyWord,Row_Number() Over(Partition by EventID,UserID ,SessionID,KeyWord) as Rank, SPLIT(Regexp_extract(EventDetails,r'^*\{(.*)\}'),",") FROM (SELECT 123 as EventID, 456 as UserID, 789 as SessionID,'ABS' as KeyWord, 'bla{1234PQ5,6789KL0,1234PQ5,6789KL0,123}' as EventDetails). can you share your query so we can explore it?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.