1

updated info at bottom I have a group from a df.groupby that looks like this:

    stop_id     stop_name                           arrival_time    departure_time  stop_sequence   
0   87413013    Gare de Le Havre                    05:20:00        05:20:00        0.0 
1   87413344    Gare de Bréauté-Beuzeville          05:35:00        05:36:00        1.0 
2   87413385    Gare de Yvetot                      05:49:00        05:50:00        2.0 
3   87411017    Gare de Rouen-Rive-Droite           06:12:00        06:15:00        3.0 
4   87384008    Gare de Paris-St-Lazare             07:38:00        07:38:00        4.0 

I want to loop each row and use "stop_name" as the location of departure and then get the following "stop_name" of the next rows as the location of arrival. Finally I use the below func in order to parse the times and calc the trip duration in seconds.

def timestrToSeconds(timestr):
    ftr = [3600,60,1]
    return sum([a*b for a,b in zip(ftr, map(int,timestr.split(':')))])

The output is expected to be an array with all possible combinations like below :

result = [
('Gare de Le Havre', 'Gare de Bréauté-Beuzeville', 900),
('Gare de Le Havre', 'Gare de Yvetot', 1740),
('Gare de Le Havre', 'Gare de Rouen-Rive-Droite', 3120),
('Gare de Le Havre', 'Gare de Paris-St-Lazare', 8280),
('Gare de Bréauté-Beuzeville', 'Gare de Yvetot', 780),
('Gare de Bréauté-Beuzeville', 'Gare de Rouen-Rive-Droite', 2160),
('Gare de Bréauté-Beuzeville', 'Gare de Paris-St-Lazare', 7320),
('Gare de Yvetot', 'Gare de Rouen-Rive-Droite', 3120),
('Gare de Yvetot', 'Gare de Paris-St-Lazare', 6480),
('Gare de Rouen-Rive-Droite', 'Gare de Paris-St-Lazare', 4980),
]

I have tried with nested loops but ended up being too abstract for me. Any advice is more than welcome

UPDATE

Mazhar's solution seems to work find on a single group, but when i loop through my groupby like this :

timeBetweenStops  = []

for group_name, group in xgrouped:
    
    group.arrival_time = pd.to_timedelta(group.arrival_time)
    group.departure_time = pd.to_timedelta(group.departure_time)

    new_df = group['departure_time'].apply(lambda x: (
        group['arrival_time']-x).apply(lambda y: y.total_seconds()))

    new_df.index = group.stop_name
    new_df.columns = group.stop_name

    for i in new_df.index:
        for j in new_df.columns:
            if new_df.loc[i, j] > 0:
                r = (i, j, new_df.loc[i, j])
                timeBetweenStops.append(r)

I get the following error:

ValueError                                Traceback (most recent call last)
<ipython-input-196-ec050382d2b5> in <module>
     14     for i in new_df.index:
     15         for j in new_df.columns:
---> 16             if new_df.loc[i, j] > 0:
     17                 r = (i, j, new_df.loc[i, j])
     18                 timeBetweenStopsA.append(r)

~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/generic.py in __nonzero__(self)
   1476 
   1477     def __nonzero__(self):
-> 1478         raise ValueError(
   1479             f"The truth value of a {type(self).__name__} is ambiguous. "
   1480             "Use a.empty, a.bool(), a.item(), a.any() or a.all()."

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I have tried to use if np.where(new_df.loc[i, j] > 0): , but then i get plenty of incoherences in my result.

1
  • 1
    Can you add code for a minimal working dataframe to check your code (and suggest a solution)? Commented Jan 7, 2022 at 16:31

5 Answers 5

1

Convert your time columns to Timedelta with to_timedelta

df['arrival_time'] = pd.to_timedelta(df['arrival_time'])
df['departure_time'] = pd.to_timedelta(df['departure_time'])

Now use itertools.combinations to generate all combinations:

from itertools import combinations

comb = lambda x: [
    (x.loc[i1, 'stop_name'], x.loc[i2, 'stop_name'], 
    int((x.loc[i2, 'departure_time'] - x.loc[i1, 'arrival_time']).total_seconds()))
        for i1, i2 in combinations(x.index, 2)
]

For your current group:

>>> comb(df)
[('Gare de Le Havre', 'Gare de Bréauté-Beuzeville', 960),
 ('Gare de Le Havre', 'Gare de Yvetot', 1800),
 ('Gare de Le Havre', 'Gare de Rouen-Rive-Droite', 3300),
 ('Gare de Le Havre', 'Gare de Paris-St-Lazare', 8280),
 ('Gare de Bréauté-Beuzeville', 'Gare de Yvetot', 900),
 ('Gare de Bréauté-Beuzeville', 'Gare de Rouen-Rive-Droite', 2400),
 ('Gare de Bréauté-Beuzeville', 'Gare de Paris-St-Lazare', 7380),
 ('Gare de Yvetot', 'Gare de Rouen-Rive-Droite', 1560),
 ('Gare de Yvetot', 'Gare de Paris-St-Lazare', 6540),
 ('Gare de Rouen-Rive-Droite', 'Gare de Paris-St-Lazare', 5160)]

On many groups:

>>> df.groupby(...).apply(comb)

1    [(Gare de Le Havre, Gare de Bréauté-Beuzeville...
dtype: object
Sign up to request clarification or add additional context in comments.

Comments

1
df.arrival_time = pd.to_timedelta(df.arrival_time)
df.departure_time = pd.to_timedelta(df.departure_time)

new_df = df['departure_time'].apply(lambda x: (
    df['arrival_time']-x).apply(lambda y: y.total_seconds()))

new_df.index = df.stop_name
new_df.columns = df.stop_name

for i in new_df.index:
    for j in new_df.columns:
        if new_df.loc[i, j] > 0:
            print(i, j, new_df.loc[i, j])

enter image description here

Comments

0

Until you update your question so this code can be checked with real data, here is one solution:

all_combs=combinations(df['stop_name'].to_list())
results=[]
for c in all_combs:
    results.append((*c,abs(df.loc[df['stop_name']==c[0],'arrival_time']-df.loc[df['stop_name']==c[1],'arrival_time'])))

That's assum,ing that arrival_time (or whatever desired column you try to look into) is already in pandas.timedate format. If not, take a look here and convert to timedate:
Pandas convert Column to time

Note: This code works assuming that you have one value for each location in the column.

Comments

0

I don't think you can escape nested loops here. It may be possible to do it using list comprehension but it will be even more abstract...

You can get your result with the following code:

resultat = []

for i, ligne1 in df.iterrows():
    
    depart = ligne1.stop_name
    departure_time = ligne1.departure_time
    
    for _, ligne2 in df.iloc[(i + 1):].iterrows():
        arrivee = ligne2.stop_name
        arrival_time = ligne2.arrival_time
        duree = timestrToSeconds(arrival_time) - timestrToSeconds(departure_time)
        
        resultat = resultat + [(depart, arrivee, duree)]

(Edit) This code works assuming that stations are ordered from departure to arrival. If it's not the case, you can order the dataframe with:

df = df.sort_values(by = 'departure_time')

Comments

0

I think you can do this without loops, substituting a heavy-handed cross join instead:


from io import StringIO
import pandas
import numpy

filedata = StringIO("""\
stop_id     stop_name                           arrival_time    departure_time  stop_sequence   
87413013    Gare de Le Havre                    05:20:00        05:20:00        0.0 
87413344    Gare de Bréauté-Beuzeville          05:35:00        05:36:00        1.0 
87413385    Gare de Yvetot                      05:49:00        05:50:00        2.0 
87411017    Gare de Rouen-Rive-Droite           06:12:00        06:15:00        3.0 
87384008    Gare de Paris-St-Lazare             07:38:00        07:38:00        4.0 
""")

df = (
    pandas.read_csv(filedata, sep="\s\s+", parse_dates=["arrival_time", "departure_time"])
)

results = (
    df.merge(df, how="cross")
      .loc[lambda df: df["stop_sequence_x"] < df["stop_sequence_y"]]
      .assign(travel_time_seconds=lambda df: 
              df["arrival_time_y"]
                  .sub(df["departure_time_x"])
                  .dt.total_seconds()
        )
      .loc[:, ["stop_name_x", "stop_name_y", "travel_time_seconds"]]
      .reset_index(drop=True)  
)

and that gives me:


                  stop_name_x                 stop_name_y  travel_time_seconds
0            Gare de Le Havre  Gare de Bréauté-Beuzeville                900.0
1            Gare de Le Havre              Gare de Yvetot               1740.0
2            Gare de Le Havre   Gare de Rouen-Rive-Droite               3120.0
3            Gare de Le Havre     Gare de Paris-St-Lazare               8280.0
4  Gare de Bréauté-Beuzeville              Gare de Yvetot                780.0
5  Gare de Bréauté-Beuzeville   Gare de Rouen-Rive-Droite               2160.0
6  Gare de Bréauté-Beuzeville     Gare de Paris-St-Lazare               7320.0
7              Gare de Yvetot   Gare de Rouen-Rive-Droite               1320.0
8              Gare de Yvetot     Gare de Paris-St-Lazare               6480.0
9   Gare de Rouen-Rive-Droite     Gare de Paris-St-Lazare               4980.0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.