0

I'm struggling a bit to reason how to best go about this. I have a table that has 1 column (b) of values I like to check against. The main table I am querying has a column (a) with string values, for which I want to check if they contain elements from the table with column b

Right now I'm only checking for literal matches;

SELECT CASE WHEN lower(table1.a) IN (SELECT lower(table2.b) FROM table2) THEN 1 ELSE 0 END
FROM table1

However the values in column b can contain more words other than an exact match. Now I thought I could use SPLIT to check each word in the string, but not sure if that's the best/only way to go. Especially since it varies per row how many words the string even has.

Something like;

DECLARE match_against as ARRAY;
SET match_against = SELECT lower(table2.b) FROM table2;
SELECT CASE WHEN lower(split(table1.a, ' ')[SAFE_ORDINAL(0)] IN match_against 
OR lower(split(table1.a, ' ')[SAFE_ORDINAL(1)] IN match_against 
OR lower(split(table1.a, ' ')[SAFE_ORDINAL(2)] IN match_against 
THEN 1 ELSE 0 END 
FROM table1;

Especially if column a in table2 is containing strings with multiple words this becomes unworkable. (check for any match between a word from the string in column a and any word in the strings from column b)

any suggestions on how to achieve this?

Some example data: table1 column a values: 'abc', 'aBc dEf', 'abc def ghij' table2 column b values: 'def'

Expected result is 0,1,1

1 Answer 1

1

A left join is a good way to comnbine two tables. The CONTAINS_SUBSTR does not accept a column as seach parameter, therefore, we use a UDF:

create temp function contains_str(a string,b string ) returns bool language js as 
"""
return a.includes(b);
""";

with tbl_a as (select a from unnest([ 'abc', 'aBc dEf', 'abc def ghij',"noabc,"cats","cat"])a),
tbl_b as (select b from unnest([ 'dEf', 'abc',"cat"])b)
select *,
contains_str(a,b)
 from tbl_a
left join tbl_b
on contains_str(concat(" ",lower(a)," "),concat(" ",lower(b)," "))

We add spaces around each search. Thus noabc is not matched by abc and cats not by cat.

Each entry of table A is repeated for each match to table B.

Another way is following. It only give the entries of table A which fullfill a match in table B, but it cannnot return additional columns from table B. The soundex helps to find matches. Please read documentation.

create temp function contains_str(a string,b string ) returns bool language js as 
"""
return a.includes(b);
""";

with tbl_a as (select a from unnest([ 'abc', 'aBc dEf', 'abc def ghij',"cats","cat"])a),
tbl_b as (select b from unnest([ 'dEf', 'abc',"cat"])b)
select *,
(select sum(if(contains_str(concat(" ",(lower(a))," "),concat(" ",(lower(b))," ")),1,0)) from tbl_b )

 from tbl_a
# where exists (select if(contains_str(concat(" ",lower(a)," "),concat(" ",lower(b)," ")),1,null) from tbl_b )


Sign up to request clarification or add additional context in comments.

2 Comments

I tried to avoid using left joins, as I need to check multiple different tables to assign the value of 1 column in the result set. (hence the use of the CASE WHEN to try to keep it readable and comprehensive). I'll look into your suggestions some more, thanks
ok, see inline query.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.