updated info at bottom I have a group from a df.groupby that looks like this:
stop_id stop_name arrival_time departure_time stop_sequence
0 87413013 Gare de Le Havre 05:20:00 05:20:00 0.0
1 87413344 Gare de Bréauté-Beuzeville 05:35:00 05:36:00 1.0
2 87413385 Gare de Yvetot 05:49:00 05:50:00 2.0
3 87411017 Gare de Rouen-Rive-Droite 06:12:00 06:15:00 3.0
4 87384008 Gare de Paris-St-Lazare 07:38:00 07:38:00 4.0
I want to loop each row and use "stop_name" as the location of departure and then get the following "stop_name" of the next rows as the location of arrival. Finally I use the below func in order to parse the times and calc the trip duration in seconds.
def timestrToSeconds(timestr):
ftr = [3600,60,1]
return sum([a*b for a,b in zip(ftr, map(int,timestr.split(':')))])
The output is expected to be an array with all possible combinations like below :
result = [
('Gare de Le Havre', 'Gare de Bréauté-Beuzeville', 900),
('Gare de Le Havre', 'Gare de Yvetot', 1740),
('Gare de Le Havre', 'Gare de Rouen-Rive-Droite', 3120),
('Gare de Le Havre', 'Gare de Paris-St-Lazare', 8280),
('Gare de Bréauté-Beuzeville', 'Gare de Yvetot', 780),
('Gare de Bréauté-Beuzeville', 'Gare de Rouen-Rive-Droite', 2160),
('Gare de Bréauté-Beuzeville', 'Gare de Paris-St-Lazare', 7320),
('Gare de Yvetot', 'Gare de Rouen-Rive-Droite', 3120),
('Gare de Yvetot', 'Gare de Paris-St-Lazare', 6480),
('Gare de Rouen-Rive-Droite', 'Gare de Paris-St-Lazare', 4980),
]
I have tried with nested loops but ended up being too abstract for me. Any advice is more than welcome
UPDATE
Mazhar's solution seems to work find on a single group, but when i loop through my groupby like this :
timeBetweenStops = []
for group_name, group in xgrouped:
group.arrival_time = pd.to_timedelta(group.arrival_time)
group.departure_time = pd.to_timedelta(group.departure_time)
new_df = group['departure_time'].apply(lambda x: (
group['arrival_time']-x).apply(lambda y: y.total_seconds()))
new_df.index = group.stop_name
new_df.columns = group.stop_name
for i in new_df.index:
for j in new_df.columns:
if new_df.loc[i, j] > 0:
r = (i, j, new_df.loc[i, j])
timeBetweenStops.append(r)
I get the following error:
ValueError Traceback (most recent call last)
<ipython-input-196-ec050382d2b5> in <module>
14 for i in new_df.index:
15 for j in new_df.columns:
---> 16 if new_df.loc[i, j] > 0:
17 r = (i, j, new_df.loc[i, j])
18 timeBetweenStopsA.append(r)
~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/generic.py in __nonzero__(self)
1476
1477 def __nonzero__(self):
-> 1478 raise ValueError(
1479 f"The truth value of a {type(self).__name__} is ambiguous. "
1480 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I have tried to use if np.where(new_df.loc[i, j] > 0): , but then i get plenty of incoherences in my result.
