3

I have a PostgreSQL database that contains traffic tickets written by a few jurisdictions.

Some jurisdictions don't indicate if multiple tickets are written in one traffic stop. However, that can be inferred by analyzing other fields. Consider this data:

ticket_id  timestamp            drivers_license
----------------------------------------------
1          2008-08-07 01:51:00  11111111
2          2008-08-07 01:51:00  11111111
3          2008-08-07 02:02:00  22222222
4          2008-08-07 02:25:00  33333333
5          2008-08-07 04:23:00  44444444
6          2008-08-07 04:23:00  55555555
7          2008-08-07 04:23:00  44444444

I can infer that:

  • Tickets 1 & 2 were written in a single traffic stop because they share driver's license numbers and timestamps.
  • Same for 5 & 7, but notice how ticket 6 is between them. Perhaps another officer was writing a ticket at the same time somewhere else, or data entry operators entered stuff out of order.

I would like to add another column that has a unique ID for each traffic stop. It will not be a primary key for the table because it will have duplicate values. For example:

ticket_id  timestamp            drivers_license  stop_id
--------------------------------------------------------
1          2008-08-07 01:51:00  11111111         1
2          2008-08-07 01:51:00  11111111         1
3          2008-08-07 02:02:00  22222222         2
4          2008-08-07 02:25:00  33333333         3
5          2008-08-07 04:23:00  44444444         4
6          2008-08-07 04:23:00  55555555         5
7          2008-08-07 04:23:00  44444444         4

I can think of computationally-intensive, greedy algorithm ways of doing this with C#, but is there an efficient SQL query that can work?

3
  • I fail to see how simply adding another column is different from having already a foreign key to driver licenses ... Commented Mar 8, 2012 at 4:05
  • 1
    It's different because of the timestamp. Commented Mar 8, 2012 at 4:07
  • 1
    Yup. A single motorist (single DL) could be cited on different occasions. Commented Mar 9, 2012 at 3:55

4 Answers 4

3

If you employ the window function rank() this becomes amazingly simple:

SELECT *
      ,rank() OVER (ORDER BY ts, drivers_license)
FROM   tbl
ORDER  BY ticket_id

Returns exactly what you asked for.

I renamed your column timestamp to ts, because timestamp is a type name in PostgreSQL and a reserved word in every SQL standard.

Sign up to request clarification or add additional context in comments.

Comments

1

Efficient SQL Query FTW!

I'm not at a computer that I can test this on so there is likely some syntax problems; I will fix in the morning, but it is something like this:

WITH uniquez as (SELECT timestamp, drivers_license, 
rank() over (ORDER BY timestamp, drivers_license) as counterz 
FROM ticketTable)

UPDATE ticketTable TT
SET stop_id = uniquez.counterz
WHERE uniquez.timestamp = TT.timestamp
AND uniquez.drivers_license = TT.drivers_license

Basically, you make a select that groups (partitions) by timestamp and drivers_license and have a row counter that goes with it. When you do the update, you use this previous select table's row counter as your "stop_id" and updates the columns that match the timestamp and drivers license.

5 Comments

Whether postgreSQL allows it or not, it's a very bad idea to use ROW_NUMBER() without an ORDER BY clause. In any case, I don't think what you have here works at all. If you partition by timestamp, drivers_license, the row numbering will start at 1 again each time timestamp, drivers_license changes. If you change partition by to ORDER BY, you are closer, but I think you would want DENSE_RANK(), not row_number().
@SteveKass definitely agree. Late night programming will have that effect on things. I will edit.
-1 You just copied my correct version over your incorrect one without crediting. That's not the recommended way around here - to put it politely.
I'm getting a strange error with the query I constructed based on this. I opened a separate question at stackoverflow.com/questions/9643859/….
Turns out you're missing a FROM clause in the UPDATE statement. After the SET line, you need a FROM uniquez.
1

Probably your best bet is to create a new table (say, "stops") with DISTINCT timestamps and drivers' license #s, assign row numbers, then update the ticket table from that new table.

Comments

-1
SELECT ticket_id,timestamp,drivers_license,substr(drivers_license,1,1) as stop_id
FROM traffic_data;

Hope this works for u... :)

1 Comment

It's possible that a person could receive separate tickets in separate stops, so that couldn't work, unfortunately.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.