Data-frame using values from different rows while iterating

Question

updated info at bottom I have a group from a df.groupby that looks like this:

    stop_id     stop_name                           arrival_time    departure_time  stop_sequence   
0   87413013    Gare de Le Havre                    05:20:00        05:20:00        0.0 
1   87413344    Gare de Bréauté-Beuzeville          05:35:00        05:36:00        1.0 
2   87413385    Gare de Yvetot                      05:49:00        05:50:00        2.0 
3   87411017    Gare de Rouen-Rive-Droite           06:12:00        06:15:00        3.0 
4   87384008    Gare de Paris-St-Lazare             07:38:00        07:38:00        4.0

I want to loop each row and use "stop_name" as the location of departure and then get the following "stop_name" of the next rows as the location of arrival. Finally I use the below func in order to parse the times and calc the trip duration in seconds.

def timestrToSeconds(timestr):
    ftr = [3600,60,1]
    return sum([a*b for a,b in zip(ftr, map(int,timestr.split(':')))])

The output is expected to be an array with all possible combinations like below :

result = [
('Gare de Le Havre', 'Gare de Bréauté-Beuzeville', 900),
('Gare de Le Havre', 'Gare de Yvetot', 1740),
('Gare de Le Havre', 'Gare de Rouen-Rive-Droite', 3120),
('Gare de Le Havre', 'Gare de Paris-St-Lazare', 8280),
('Gare de Bréauté-Beuzeville', 'Gare de Yvetot', 780),
('Gare de Bréauté-Beuzeville', 'Gare de Rouen-Rive-Droite', 2160),
('Gare de Bréauté-Beuzeville', 'Gare de Paris-St-Lazare', 7320),
('Gare de Yvetot', 'Gare de Rouen-Rive-Droite', 3120),
('Gare de Yvetot', 'Gare de Paris-St-Lazare', 6480),
('Gare de Rouen-Rive-Droite', 'Gare de Paris-St-Lazare', 4980),
]

I have tried with nested loops but ended up being too abstract for me. Any advice is more than welcome

UPDATE

Mazhar's solution seems to work find on a single group, but when i loop through my groupby like this :

timeBetweenStops  = []

for group_name, group in xgrouped:
    
    group.arrival_time = pd.to_timedelta(group.arrival_time)
    group.departure_time = pd.to_timedelta(group.departure_time)

    new_df = group['departure_time'].apply(lambda x: (
        group['arrival_time']-x).apply(lambda y: y.total_seconds()))

    new_df.index = group.stop_name
    new_df.columns = group.stop_name

    for i in new_df.index:
        for j in new_df.columns:
            if new_df.loc[i, j] > 0:
                r = (i, j, new_df.loc[i, j])
                timeBetweenStops.append(r)

I get the following error:

ValueError                                Traceback (most recent call last)
<ipython-input-196-ec050382d2b5> in <module>
     14     for i in new_df.index:
     15         for j in new_df.columns:
---> 16             if new_df.loc[i, j] > 0:
     17                 r = (i, j, new_df.loc[i, j])
     18                 timeBetweenStopsA.append(r)

~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/generic.py in __nonzero__(self)
   1476 
   1477     def __nonzero__(self):
-> 1478         raise ValueError(
   1479             f"The truth value of a {type(self).__name__} is ambiguous. "
   1480             "Use a.empty, a.bool(), a.item(), a.any() or a.all()."

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I have tried to use if np.where(new_df.loc[i, j] > 0): , but then i get plenty of incoherences in my result.

Can you add code for a minimal working dataframe to check your code (and suggest a solution)? — OnY
– OnY, Commented Jan 7, 2022 at 16:31

Corralien · Accepted Answer · 2022-01-07 16:57:00Z

Convert your time columns to Timedelta with to_timedelta

df['arrival_time'] = pd.to_timedelta(df['arrival_time'])
df['departure_time'] = pd.to_timedelta(df['departure_time'])

Now use itertools.combinations to generate all combinations:

from itertools import combinations

comb = lambda x: [
    (x.loc[i1, 'stop_name'], x.loc[i2, 'stop_name'], 
    int((x.loc[i2, 'departure_time'] - x.loc[i1, 'arrival_time']).total_seconds()))
        for i1, i2 in combinations(x.index, 2)
]

For your current group:

>>> comb(df)
[('Gare de Le Havre', 'Gare de Bréauté-Beuzeville', 960),
 ('Gare de Le Havre', 'Gare de Yvetot', 1800),
 ('Gare de Le Havre', 'Gare de Rouen-Rive-Droite', 3300),
 ('Gare de Le Havre', 'Gare de Paris-St-Lazare', 8280),
 ('Gare de Bréauté-Beuzeville', 'Gare de Yvetot', 900),
 ('Gare de Bréauté-Beuzeville', 'Gare de Rouen-Rive-Droite', 2400),
 ('Gare de Bréauté-Beuzeville', 'Gare de Paris-St-Lazare', 7380),
 ('Gare de Yvetot', 'Gare de Rouen-Rive-Droite', 1560),
 ('Gare de Yvetot', 'Gare de Paris-St-Lazare', 6540),
 ('Gare de Rouen-Rive-Droite', 'Gare de Paris-St-Lazare', 5160)]

On many groups:

>>> df.groupby(...).apply(comb)

1    [(Gare de Le Havre, Gare de Bréauté-Beuzeville...
dtype: object

Mazhar · Accepted Answer · 2022-01-07 17:47:27Z

1

df.arrival_time = pd.to_timedelta(df.arrival_time)
df.departure_time = pd.to_timedelta(df.departure_time)

new_df = df['departure_time'].apply(lambda x: (
    df['arrival_time']-x).apply(lambda y: y.total_seconds()))

new_df.index = df.stop_name
new_df.columns = df.stop_name

for i in new_df.index:
    for j in new_df.columns:
        if new_df.loc[i, j] > 0:
            print(i, j, new_df.loc[i, j])

answered Jan 7, 2022 at 17:47

Mazhar

1,0647 silver badges12 bronze badges

Comments

OnY · Accepted Answer · 2022-01-07 16:45:20Z

0

Until you update your question so this code can be checked with real data, here is one solution:

all_combs=combinations(df['stop_name'].to_list())
results=[]
for c in all_combs:
    results.append((*c,abs(df.loc[df['stop_name']==c[0],'arrival_time']-df.loc[df['stop_name']==c[1],'arrival_time'])))

That's assum,ing that arrival_time (or whatever desired column you try to look into) is already in pandas.timedate format. If not, take a look here and convert to timedate:
Pandas convert Column to time

Note: This code works assuming that you have one value for each location in the column.

answered Jan 7, 2022 at 16:45

OnY

8976 silver badges13 bronze badges

Comments

Thrasy · Accepted Answer · 2022-01-07 16:52:34Z

0

I don't think you can escape nested loops here. It may be possible to do it using list comprehension but it will be even more abstract...

You can get your result with the following code:

resultat = []

for i, ligne1 in df.iterrows():
    
    depart = ligne1.stop_name
    departure_time = ligne1.departure_time
    
    for _, ligne2 in df.iloc[(i + 1):].iterrows():
        arrivee = ligne2.stop_name
        arrival_time = ligne2.arrival_time
        duree = timestrToSeconds(arrival_time) - timestrToSeconds(departure_time)
        
        resultat = resultat + [(depart, arrivee, duree)]

(Edit) This code works assuming that stations are ordered from departure to arrival. If it's not the case, you can order the dataframe with:

df = df.sort_values(by = 'departure_time')

edited Jan 7, 2022 at 16:52

answered Jan 7, 2022 at 16:44

Thrasy

6065 silver badges9 bronze badges

Comments

Paul H · Accepted Answer · 2022-01-07 19:00:13Z

I think you can do this without loops, substituting a heavy-handed cross join instead:


from io import StringIO
import pandas
import numpy

filedata = StringIO("""\
stop_id     stop_name                           arrival_time    departure_time  stop_sequence   
87413013    Gare de Le Havre                    05:20:00        05:20:00        0.0 
87413344    Gare de Bréauté-Beuzeville          05:35:00        05:36:00        1.0 
87413385    Gare de Yvetot                      05:49:00        05:50:00        2.0 
87411017    Gare de Rouen-Rive-Droite           06:12:00        06:15:00        3.0 
87384008    Gare de Paris-St-Lazare             07:38:00        07:38:00        4.0 
""")

df = (
    pandas.read_csv(filedata, sep="\s\s+", parse_dates=["arrival_time", "departure_time"])
)

results = (
    df.merge(df, how="cross")
      .loc[lambda df: df["stop_sequence_x"] < df["stop_sequence_y"]]
      .assign(travel_time_seconds=lambda df: 
              df["arrival_time_y"]
                  .sub(df["departure_time_x"])
                  .dt.total_seconds()
        )
      .loc[:, ["stop_name_x", "stop_name_y", "travel_time_seconds"]]
      .reset_index(drop=True)  
)

and that gives me:


                  stop_name_x                 stop_name_y  travel_time_seconds
0            Gare de Le Havre  Gare de Bréauté-Beuzeville                900.0
1            Gare de Le Havre              Gare de Yvetot               1740.0
2            Gare de Le Havre   Gare de Rouen-Rive-Droite               3120.0
3            Gare de Le Havre     Gare de Paris-St-Lazare               8280.0
4  Gare de Bréauté-Beuzeville              Gare de Yvetot                780.0
5  Gare de Bréauté-Beuzeville   Gare de Rouen-Rive-Droite               2160.0
6  Gare de Bréauté-Beuzeville     Gare de Paris-St-Lazare               7320.0
7              Gare de Yvetot   Gare de Rouen-Rive-Droite               1320.0
8              Gare de Yvetot     Gare de Paris-St-Lazare               6480.0
9   Gare de Rouen-Rive-Droite     Gare de Paris-St-Lazare               4980.0

Collectives™ on Stack Overflow

Data-frame using values from different rows while iterating

5 Answers 5

Comments

Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related