Redshift SQL: Column Counting Matching Rows Given a Condition within another column

Question

I have a table with two columns, user ids and strings. I want to add a third column that counts the number of strings within the second column that start with the entire string value in any given row. There is only one row per user.

The goal is to get the following table structure:

Here the count is equal to the number of rows in the string column that start with the given value in that row. I've tried using count(string like string + '%') in combination with over(partition by string) but it doesn't seem to work how I had hoped.

Any help is appreciated (btw I'm using SQL on Redshift).

Gordon Linoff · Accepted Answer · 2021-06-25 21:38:42Z

2

The fastest way in Redshift is probably to use window functions. However, it requires a lot of verbosity -- because you need a separate column for each string length:

select t.*,
       (case when string = string_1
             then count(*) over (partition by string_1)
             when string = string_2
             then count(*) over (partition by string_2)
             when string = string_3
             then count(*) over (partition by string_3)
        end)
from (select t.*,
             left(string, 1) as string_1,
             left(string, 2) as string_2,
             left(string, 3) as string_3
      from t
     ) t;

Hmmm . . . The subquery is not needed:

select t.*,
       (case len(string)
             when 1 then count(*) over (partition by left(string, 1))
             when 2 then count(*) over (partition by left(string, 2))
             when 3 then count(*) over (partition by left(string, 3))
        end)
from t;

answered Jun 25, 2021 at 21:38

Gordon Linoff

1.3m62 gold badges706 silver badges857 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Bill Weiner Over a year ago

Have you tried removing the case and putting the len(string) in the left() function? "select , count() over (partition by left(string, len(string)) from t"

Gordon Linoff Over a year ago

@BillWeiner . . . That should use the length in each row in the partition, rather then one row per partition. In any case, this illustrates that it does not work: dbfiddle.uk/….

Bill Weiner Over a year ago

true enough - need the case to make the length static in the window function.

throwawaydisplayname Over a year ago

This solution was the fastest for me. Thanks Gordon!

forpas Over a year ago

@throwawaydisplayname are all your strings with max 3 chars?

|

forpas · Accepted Answer · 2021-06-25 20:44:51Z

1

With a correlated subquery:

SELECT t1.ID, t1.String, 
       (SELECT COUNT(*) FROM tablename t2 WHERE t2.String LIKE CONCAT(t1.String, '%')) AS Count
FROM tablename t1

Or, with a self join:

SELECT t1.ID, t1.String, COUNT(*) AS Count
FROM tablename t1 INNER JOIN tablename t2
ON t2.String LIKE CONCAT(t1.String, '%')
GROUP BY t1.ID, t1.String

See the demo.

Results:

id	string	count
1	a	3
2	ab	2
3	abc	1
4	d	3
5	de	2
6	def	1

edited Jun 25, 2021 at 20:44

answered Jun 25, 2021 at 20:23

forpas

165k10 gold badges51 silver badges85 bronze badges

3 Comments

throwawaydisplayname Over a year ago

Thanks for the help. This works for me in the demo but in the actual redshift database I'm getting this error: "This type of correlated subquery pattern is not supported due to an internal error".

forpas Over a year ago

@throwawaydisplayname The query uses standard SQL, so it's strange that it is not supported by Redshift. Try my 2nd query.

throwawaydisplayname Over a year ago

Yup I think that works, the data is pretty huge so I'm still waiting on it to run but I'll post again if it doesn't look right. Thanks a ton @forpas.

Collectives™ on Stack Overflow

Redshift SQL: Column Counting Matching Rows Given a Condition within another column

2 Answers 2

7 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related