Add number to rows based on identical values in selected columns

Question

I have a PostgreSQL database that contains traffic tickets written by a few jurisdictions.

Some jurisdictions don't indicate if multiple tickets are written in one traffic stop. However, that can be inferred by analyzing other fields. Consider this data:

ticket_id  timestamp            drivers_license
----------------------------------------------
1          2008-08-07 01:51:00  11111111
2          2008-08-07 01:51:00  11111111
3          2008-08-07 02:02:00  22222222
4          2008-08-07 02:25:00  33333333
5          2008-08-07 04:23:00  44444444
6          2008-08-07 04:23:00  55555555
7          2008-08-07 04:23:00  44444444

I can infer that:

Tickets 1 & 2 were written in a single traffic stop because they share driver's license numbers and timestamps.
Same for 5 & 7, but notice how ticket 6 is between them. Perhaps another officer was writing a ticket at the same time somewhere else, or data entry operators entered stuff out of order.

I would like to add another column that has a unique ID for each traffic stop. It will not be a primary key for the table because it will have duplicate values. For example:

ticket_id  timestamp            drivers_license  stop_id
--------------------------------------------------------
1          2008-08-07 01:51:00  11111111         1
2          2008-08-07 01:51:00  11111111         1
3          2008-08-07 02:02:00  22222222         2
4          2008-08-07 02:25:00  33333333         3
5          2008-08-07 04:23:00  44444444         4
6          2008-08-07 04:23:00  55555555         5
7          2008-08-07 04:23:00  44444444         4

I can think of computationally-intensive, greedy algorithm ways of doing this with C#, but is there an efficient SQL query that can work?

I fail to see how simply adding another column is different from having already a foreign key to driver licenses ... — tereško
– tereško, Commented Mar 8, 2012 at 4:05
Yup. A single motorist (single DL) could be cited on different occasions. — Aren Cambre
– Aren Cambre, Commented Mar 9, 2012 at 3:55

Erwin Brandstetter · Accepted Answer · 2012-03-08 06:55:50Z

3

If you employ the window function rank() this becomes amazingly simple:

SELECT *
      ,rank() OVER (ORDER BY ts, drivers_license)
FROM   tbl
ORDER  BY ticket_id

Returns exactly what you asked for.

I renamed your column timestamp to ts, because timestamp is a type name in PostgreSQL and a reserved word in every SQL standard.

answered Mar 8, 2012 at 6:55

Erwin Brandstetter

669k160 gold badges1.2k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

ImGreg · Accepted Answer · 2012-03-08 14:27:32Z

1

Efficient SQL Query FTW!

I'm not at a computer that I can test this on so there is likely some syntax problems; I will fix in the morning, but it is something like this:

WITH uniquez as (SELECT timestamp, drivers_license, 
rank() over (ORDER BY timestamp, drivers_license) as counterz 
FROM ticketTable)

UPDATE ticketTable TT
SET stop_id = uniquez.counterz
WHERE uniquez.timestamp = TT.timestamp
AND uniquez.drivers_license = TT.drivers_license

Basically, you make a select that groups (partitions) by timestamp and drivers_license and have a row counter that goes with it. When you do the update, you use this previous select table's row counter as your "stop_id" and updates the columns that match the timestamp and drivers license.

edited Mar 8, 2012 at 14:27

answered Mar 8, 2012 at 4:18

ImGreg

3,02317 gold badges47 silver badges66 bronze badges

5 Comments

Steve Kass Over a year ago

Whether postgreSQL allows it or not, it's a very bad idea to use ROW_NUMBER() without an ORDER BY clause. In any case, I don't think what you have here works at all. If you partition by timestamp, drivers_license, the row numbering will start at 1 again each time timestamp, drivers_license changes. If you change partition by to ORDER BY, you are closer, but I think you would want DENSE_RANK(), not row_number().

ImGreg Over a year ago

@SteveKass definitely agree. Late night programming will have that effect on things. I will edit.

Erwin Brandstetter Over a year ago

-1 You just copied my correct version over your incorrect one without crediting. That's not the recommended way around here - to put it politely.

Aren Cambre Over a year ago

I'm getting a strange error with the query I constructed based on this. I opened a separate question at stackoverflow.com/questions/9643859/….

Aren Cambre Over a year ago

Turns out you're missing a FROM clause in the UPDATE statement. After the SET line, you need a FROM uniquez.

David Faber · Accepted Answer · 2012-03-08 04:10:28Z

1

Probably your best bet is to create a new table (say, "stops") with DISTINCT timestamps and drivers' license #s, assign row numbers, then update the ticket table from that new table.

answered Mar 8, 2012 at 4:10

David Faber

12.5k2 gold badges33 silver badges41 bronze badges

Comments

Teja · Accepted Answer · 2012-03-08 04:10:13Z

-1

SELECT ticket_id,timestamp,drivers_license,substr(drivers_license,1,1) as stop_id
FROM traffic_data;

Hope this works for u... :)

answered Mar 8, 2012 at 4:10

Teja

13.7k38 gold badges103 silver badges164 bronze badges

1 Comment

Aren Cambre Over a year ago

It's possible that a person could receive separate tickets in separate stops, so that couldn't work, unfortunately.

Collectives™ on Stack Overflow

Add number to rows based on identical values in selected columns

4 Answers 4

Comments

5 Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

5 Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related