1

I have a list of strings in my database let say in a column

understand
understan
understa
underst
unders
under

I'm trying to find out How to delist subset strings if it is a substring of another string with sql?

So I if we pretend that this is a column of my table, the end result must be only

understand
4
  • What database are you using? MySQL or BigQuery? Commented Nov 22, 2019 at 14:00
  • Bigquery. I corrected it now. Commented Nov 22, 2019 at 14:00
  • Does your version of BQ support join on inequality? Without this functionality, this will not be easy to accomplish. Commented Nov 22, 2019 at 14:15
  • Yes it support.s Commented Nov 22, 2019 at 14:48

3 Answers 3

3

Below is for BigQuery Standard SQL

#standardSQL
SELECT str FROM (
  SELECT str, 
    STARTS_WITH(LAG(str) OVER(ORDER BY str DESC), str) flag 
  FROM `project.dataset.table`
)
WHERE NOT IFNULL(flag, FALSE)   

I tested above with dummy data similar to what you provided in your question

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'anderstand' str UNION ALL
  SELECT 'anderstan' UNION ALL
  SELECT 'andersta' UNION ALL
  SELECT 'anderst' UNION ALL
  SELECT 'understand' str UNION ALL
  SELECT 'understan' UNION ALL
  SELECT 'understa' UNION ALL
  SELECT 'underst' UNION ALL
  SELECT 'unders' UNION ALL
  SELECT 'under' 
)
SELECT str FROM (
  SELECT str, 
    STARTS_WITH(LAG(str) OVER(ORDER BY str DESC), str) flag 
  FROM `project.dataset.table`
)
WHERE NOT IFNULL(flag, FALSE)    

with result

Row str  
1   understand   
2   anderstand   

which I believe is exactly what you expected

Sign up to request clarification or add additional context in comments.

6 Comments

what exactly didn't work? provide example, so I can fix - meantime, I tested and it worked for me - see update with test example
It works with your example. Thank you! Just want to make sure that project.dataset.table is just a nonexistent table, we just use it it can be anything am I right?
Of course. you should use whatever table you do have. Also, consider voting up the answer if it helped - as you might know - it is also important on SO
Is it possible to skip if the str a space? So lets say I have "understand it" and "understand","understa", "understand itt", "understand itt yes" , It will show me "understand itt" and "understand" and "understand itt yes" only?
looks like possible, but it is hard to answer within the comments - please post new question with relevant new details - and I will be glad to answer
|
0

In Oracle a slow method would be:

with b as(
     select substr('understand',1,level)w from dual connect by level <= 10)
     union all 
     select substr('asdfasdfad',1,level)w from dual connect by level <= 10
,chk as( 
    select s.w p,t.w f , substr(t.w,1,length(t.w)-1)
    from b s,b t
    where s.w like substr(t.w,1,length(t.w)-1) ||  '%'
    and length( s.w) > length(t.w))
select w from  b
minus 
select f from chk

2 Comments

The problem in here is we don't know if it is 'understand'. Think of that there are a lot of cases like this. But I got your point.
Should be a generic solution. Tested it with 2nd word ... but will probably will be slow with a lot of words.
0

To solve this problem, I would recommend lead():

select t.*
from (select t.*,
             lead(str) over (order by str) as next_str
      from t
     ) t
where next_str not like concat(str, '%') or
      next_str is not null;

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.