1

I have a dataframe that contains entries for different day in and day out logs. but some logon and logout values occurs at same day. e.g. df


+-------+-----------------------------+--------------+-----------------------------+---------------+-----------+------------+
|  Id   |          LoginTime          | LoginReading |         LogoutTime          | LogoutReading | Available | Calculated |
+-------+-----------------------------+--------------+-----------------------------+---------------+-----------+------------+
| 10036 | 2019-11-06 10:37:18.3743184 |      5054.68 | 2019-11-06 14:11:52.4833904 |       5057.94 |       500 |        530 |
| 10036 | 2019-11-11 12:46:20.7018683 |      4797.39 | 2019-11-11 15:55:48.9102874 |       4846.86 |       500 |       1091 |
| 10036 | 2019-11-11 15:56:42.7485310 |      4846.86 | 2019-11-11 17:20:40.2587713 |       4866.68 |       500 |        370 |
| 10036 | 2019-11-11 17:21:12.9416691 |      4866.68 | 2019-11-11 18:09:55.8326356 |       4892.33 |       500 |        311 |
| 10036 | 2019-11-14 08:08:00.6290260 |      5080.59 | 2019-11-14 18:43:31.8609822 |       5185.65 |       500 |       2528 |
| 10036 | 2019-11-15 06:00:48.4777280 |      5185.65 | 2019-11-15 06:31:55.0281168 |       5199.28 |       500 |        260 |
+-------+-----------------------------+--------------+-----------------------------+---------------+-----------+------------+

I am trying to compress/merge rows with similar date to one row. My goal is to achieve something like this:

+-------+-----------------------------+--------------+-----------------------------+---------------+-----------+------------+
|  Id   |          LoginTime          | LoginReading |         LogoutTime          | LogoutReading | Available | Calculated |
+-------+-----------------------------+--------------+-----------------------------+---------------+-----------+------------+
| 10036 | 2019-11-06 10:37:18.3743184 |      5054.68 | 2019-11-06 14:11:52.4833904 |       5057.94 |       500 |        530 |
| 10036 | 2019-11-11 12:46:20.7018683 |      4797.39 | 2019-11-11 18:09:55.8326356 |       4892.33 |       500 |       1772 |
| 10036 | 2019-11-14 08:08:00.6290260 |      5080.59 | 2019-11-14 18:43:31.8609822 |       5185.65 |       500 |       2528 |
| 10036 | 2019-11-15 06:00:48.4777280 |      5185.65 | 2019-11-15 06:31:55.0281168 |       5199.28 |       500 |        260 |
+-------+-----------------------------+--------------+-----------------------------+---------------+-----------+------------+

I tried some other similar questions on SO but the logout time is not working.

2
  • Do you just want to take the first LoginTime entry and then the last LogoutTime entry that occur in sequential order on the same day only? Commented Nov 25, 2019 at 5:19
  • @rahlf23, yes but there are other columns that needs to be agg. like available(not in this case) and calculated Commented Nov 25, 2019 at 5:24

1 Answer 1

1

You can create a new column that tracks the day of the login and then you can use groupby() and agg() to compile the values you want:

df['Day'] = df['LoginTime'].str.extract(r'^(\d{4}-\d{2}-\d{2})')

df.groupby('Day').agg({'Id': 'first', 'LoginTime': 'first', 'LoginReading': 'first', 'LogoutTime': 'last',
'LogoutReading': 'last', 'Available': 'first', 'Calculated': 'sum'}).reset_index(drop=True)

Yields:

      Id                    LoginTime  LoginReading  \
0  10036  2019-11-06 10:37:18.3743184       5054.68   
1  10036  2019-11-11 12:46:20.7018683       4797.39   
2  10036  2019-11-14 08:08:00.6290260       5080.59   
3  10036  2019-11-15 06:00:48.4777280       5185.65   

                    LogoutTime  LogoutReading  Available  Calculated  
0  2019-11-06 14:11:52.4833904        5057.94        500         530  
1  2019-11-11 18:09:55.8326356        4892.33        500        1772  
2  2019-11-14 18:43:31.8609822        5185.65        500        2528  
3  2019-11-15 06:31:55.0281168        5199.28        500         260  
Sign up to request clarification or add additional context in comments.

1 Comment

can you please suggest what to do if login date/day and logout date/day are different e.g. loginTime =` 2019-11-06 10:37:18.3743184` and logoutTime=2019-11-07 00:11:52.4833904

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.