In google-bigquery, I need to pull the string that is between domain** and ** as in the example bellow The string is under the column "Site_Data"
Can someone help me? 10x!
Do all of the strings have that format? There are a couple of different options, assuming that you always need the third string after the ** delimiter.
1) Use SPLIT, e.g.:
#standardSQL
WITH SampleData AS (
SELECT '756-1__67648582789116,tagtype**unmapped,domain**www.sport.com,userarriveddirectly**False' AS site_data
)
SELECT SPLIT(site_data, '**')[OFFSET(2)] AS visit_type
FROM SampleData;
2) Use REGEXP_EXTRACT, e.g.:
#standardSQL
WITH SampleData AS (
SELECT '756-1__67648582789116,tagtype**unmapped,domain**www.sport.com,userarriveddirectly**False' AS site_data
)
SELECT REGEXP_EXTRACT(site_data, r'[^\*]+\*\*[^\*]+\*\*([^\*]+)') AS visit_type
FROM SampleData;
Taking this a step further, if you want to split the domain and the arrival type, you can use SPLIT again:
#standardSQL
WITH SampleData AS (
SELECT '756-1__67648582789116,tagtype**unmapped,domain**www.sport.com,userarriveddirectly**False' AS site_data
)
SELECT
SPLIT(visit_type)[OFFSET(0)] AS domain,
SPLIT(visit_type)[OFFSET(1)] AS arrival_type
FROM (
SELECT SPLIT(site_data, '**')[OFFSET(2)] AS visit_type
FROM SampleData
);