0

I have a table that has three fields:

Ref Alt INFO

A   T       SNP;FN1,DKFZp686O22169;DKFZp686O22169(uc002vez.2)///FN1(uc010zjp.1)///FN1(uc002vfa.2)///FN1(uc002vfb.2)///FN1(uc002vfc.2)///FN1(uc002vfd.2)///FN1(uc002vfe.2)///FN1(uc002vff.2)///FN1(uc002vfg.2)///FN1(uc002vfh.2)///FN1(uc002vfi.2)///FN1(uc002vfj.2)///FN1(uc010fvc.1)///FN1(uc010fvd.1);Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding;5UTR///Intron_8///Intron_32///Intron_31///Intron_31///Intron_32///Intron_31///Intron_31///Intron_31///Intron_31///Intron_32///Intron_32///Intron_2///Intron_2;.///.///.///.///.///.///.///.///.///.///.///.///.///.;.///.///.///.///.///.///.///.///.///.///.///.///.///.;A-0.9491,T-0.0509;A-970,T-52;A/A-0.9002,A/T-0.0978,T/T-0.0020;A/A-460,A/T-50,T/T-1,N/N-0

Is there anyway that I can extract A/T-0.0978 out of INFO field using the first two fields?

Thanks!

0

1 Answer 1

2

Below is for BigQuery Standard SQL and extract all values for given combination (two in your particular example)

#standardSQL
SELECT REGEXP_EXTRACT_ALL(INFO, CONCAT(r'[,;](', Ref, '/', Alt, '.*?)[,;]')) val
FROM `project.dataset.table`   

if you run it against data in your question - result will be

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'A' Ref, 'T' Alt, 'SNP;FN1,DKFZp686O22169;DKFZp686O22169(uc002vez.2)///FN1(uc010zjp.1)///FN1(uc002vfa.2)///FN1(uc002vfb.2)///FN1(uc002vfc.2)///FN1(uc002vfd.2)///FN1(uc002vfe.2)///FN1(uc002vff.2)///FN1(uc002vfg.2)///FN1(uc002vfh.2)///FN1(uc002vfi.2)///FN1(uc002vfj.2)///FN1(uc010fvc.1)///FN1(uc010fvd.1);Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding;5UTR///Intron_8///Intron_32///Intron_31///Intron_31///Intron_32///Intron_31///Intron_31///Intron_31///Intron_31///Intron_32///Intron_32///Intron_2///Intron_2;.///.///.///.///.///.///.///.///.///.///.///.///.///.;.///.///.///.///.///.///.///.///.///.///.///.///.///.;A-0.9491,T-0.0509;A-970,T-52;A/A-0.9002,A/T-0.0978,T/T-0.0020;A/A-460,A/T-50,T/T-1,N/N-0' INFO
)
SELECT REGEXP_EXTRACT_ALL(INFO, CONCAT(r'[,;](', Ref, '/', Alt, '.*?)[,;]')) val
FROM `project.dataset.table`   

with output

Row     val  
1       A/T-0.0978   
        A/T-50   

In case if you want first value - you can use

#standardSQL
SELECT REGEXP_EXTRACT_ALL(INFO, CONCAT(r'[,;](', Ref, '/', Alt, '.*?)[,;]'))[OFFSET(0)] val 
FROM `project.dataset.table`  

or

#standardSQL
SELECT REGEXP_EXTRACT(INFO, CONCAT(r'[,;](', Ref, '/', Alt, '.*?)[,;]')) val 
FROM `project.dataset.table`
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.