0

I have a table with two columns, user ids and strings. I want to add a third column that counts the number of strings within the second column that start with the entire string value in any given row. There is only one row per user.

The goal is to get the following table structure: enter image description here

Here the count is equal to the number of rows in the string column that start with the given value in that row. I've tried using count(string like string + '%') in combination with over(partition by string) but it doesn't seem to work how I had hoped.

Any help is appreciated (btw I'm using SQL on Redshift).

2 Answers 2

2

The fastest way in Redshift is probably to use window functions. However, it requires a lot of verbosity -- because you need a separate column for each string length:

select t.*,
       (case when string = string_1
             then count(*) over (partition by string_1)
             when string = string_2
             then count(*) over (partition by string_2)
             when string = string_3
             then count(*) over (partition by string_3)
        end)
from (select t.*,
             left(string, 1) as string_1,
             left(string, 2) as string_2,
             left(string, 3) as string_3
      from t
     ) t;

Hmmm . . . The subquery is not needed:

select t.*,
       (case len(string)
             when 1 then count(*) over (partition by left(string, 1))
             when 2 then count(*) over (partition by left(string, 2))
             when 3 then count(*) over (partition by left(string, 3))
        end)
from t;
Sign up to request clarification or add additional context in comments.

7 Comments

Have you tried removing the case and putting the len(string) in the left() function? "select , count() over (partition by left(string, len(string)) from t"
@BillWeiner . . . That should use the length in each row in the partition, rather then one row per partition. In any case, this illustrates that it does not work: dbfiddle.uk/….
true enough - need the case to make the length static in the window function.
This solution was the fastest for me. Thanks Gordon!
@throwawaydisplayname are all your strings with max 3 chars?
|
1

With a correlated subquery:

SELECT t1.ID, t1.String, 
       (SELECT COUNT(*) FROM tablename t2 WHERE t2.String LIKE CONCAT(t1.String, '%')) AS Count
FROM tablename t1

Or, with a self join:

SELECT t1.ID, t1.String, COUNT(*) AS Count
FROM tablename t1 INNER JOIN tablename t2
ON t2.String LIKE CONCAT(t1.String, '%')
GROUP BY t1.ID, t1.String

See the demo.

Results:

id string count
1 a 3
2 ab 2
3 abc 1
4 d 3
5 de 2
6 def 1

3 Comments

Thanks for the help. This works for me in the demo but in the actual redshift database I'm getting this error: "This type of correlated subquery pattern is not supported due to an internal error".
@throwawaydisplayname The query uses standard SQL, so it's strange that it is not supported by Redshift. Try my 2nd query.
Yup I think that works, the data is pretty huge so I'm still waiting on it to run but I'll post again if it doesn't look right. Thanks a ton @forpas.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.