1

My data looks like this:

id   Open    Close
1    1/1/15   1/1/15 
2    1/1/15   2/1/15
3    3/1/15   4/1/15  

I need to create a dataframe that shows the number of open cases on any day, so the result of the data above would look like:

Date   #Open
1/1/15   1
2/1/15   0
3/1/15   1

Any ideas?

3
  • I think open is in 3/1/15 equal 1 and is missing 4/1/15 with 1 too. Commented Apr 20, 2016 at 16:57
  • shouldn't the count for 1/1/15 be 1? At end of the day, one event got closed? Commented Apr 20, 2016 at 17:18
  • updated data to satisfy a date<closed, instead of a <=closed Commented Apr 20, 2016 at 17:30

1 Answer 1

1

This method creates an index of all days between the first case open and the max of the last case opened or closed. It then iterates through each of these dates and filters the dataframe for the relevant date, checking the resulting size.

df['Open'] = pd.to_datetime(df.Open)
df['Close'] = pd.to_datetime(df.Close)

idx = pd.date_range(df.Open.min(), max(df.Open.max(), df.Close.max()))

cases = pd.DataFrame([len(df[(date >= df.Open) & (date < df.Close)]) 
                      for date in idx], 
                     index=idx, columns=['case_count'])

>>> cases.head(3)
            case_count
2015-01-01           1
2015-01-02           1
2015-01-03           1

>>> cases.tail(3)
            case_count
2015-03-30           1
2015-03-31           1
2015-04-01           0
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.