0

Subway data Columns:

  1. Station_id (1,2,3,4,5,6,7,8,1,2,3,4,5,1,2,3,4,5,6,7,1,2,3)
  2. Number of people boarded
  3. Number of people deboarded
  4. Occupancy

Occupancy at current station = Number of people already in the train at previous station + Number of people boarded - Number of people deboarded

I am trying to fill the occupancy column. The issue is that the dataset is for multiple subway trains so station_id changes back to 1 and on that station number of people deboarded is always 0 since it is the station from which train journey begins. I have got no clue on how to do this in postgresql. The occupancy column in the sample image below is empty and needs to be filled

The train journeys are sorted and grouped. enter image description here

5
  • 2
    sample data and the expected output would really be helpful. Commented Nov 30, 2017 at 21:15
  • 3
    Please Edit your question and add some sample data in tabular format and the expected output based on that data. Formatted text please, no screen shots. edit your question - do not post code or additional information in comments. Commented Nov 30, 2017 at 21:16
  • 2
    Do you have train_id you can add to your data set? Do you have timestamps? Commented Nov 30, 2017 at 21:22
  • Thanks for the suggestions. I have uploaded a sample data. There is no timestamp column but I created an id column which is a serial Commented Nov 30, 2017 at 21:40
  • SQL tables and their IDs aren't really meant to be used like this. But you should be able to do this using a "running total". Maybe this thread would help stackoverflow.com/questions/22841206/… Commented Nov 30, 2017 at 21:43

1 Answer 1

3

You can do this with the difference of the cumulative sums. The trick is identifying the groups, which I'll do by counting the number of times that station_id has been 1 up to that record.

select s.*,
       (sum(boarded) over (partition by grp order by id) -
        sum(deboarded) over (partition by grp order by id)
       ) as occupants
from (select s.*,
             count(*) filter (where station_id = 1) over (order by id) as grp
      from subwaydata s
     ) s;
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.