Use Loop to modify Pandas Dataframe based on condition in data

Question

If my dataframe is like

z = {
    'Cust': ["a", "a", "a", "a", "a", "b", "b", "b", "b", "c", "d"],
    'datediff': [1, 3, 9, 26, 30, 1, 2, 7, 10, 5, 7],
    'row_number': [1, 2, 3, 4, 5, 1, 2, 3, 4, 1, 1],
    'Referer': ["URL1", "URL2", "URL2", "URL1", "URL1", "URL3", "URL1", "URL1",
                "URL1", "URL1", "URL1"]
}

df1 = pd.DataFrame(z)

Row_number marks the sequence of the sorted day order with in each customer (from SQL processed data)， only Datediff to the previous visit (record) is returned from SQL. (I can add date column if needed)

I need to populate the very first URL visited by each customer (to a derived column) to all the rows below it (until row_number reverts to 1, marks another customer).

This will allow me to calculate over all datediff() between all visits started with a certain URL(with some basic tricks using derived columns), using something like DF3_derived.groupby(['Referer']).['datediff'].mean()

I don't know how to do it just using normal[][boolean condition], so maybe best to do this with a Loop reading dataframe1, modify it, and save to dataframe2?

Basically (using Excel terms) getting value from the row above, , but skips if a flag which marks another beginning is met! Excel formula in d2 =if (b2>b1,A1,A2), then drag the formula down!

Write, please, what values you expect to see for customers. Your description is a little bit unclear. — Vadim Shkaberda
– Vadim Shkaberda, Commented May 24, 2016 at 17:37

MaxU - stand with Ukraine · Accepted Answer · 2016-05-25 07:32:50Z

1

If i understood you correctly, you can do it using groupby() in junction with transform('first'):

In [7]: df1.Referer = df1.groupby(['Cust'])['Referer'].transform('first')

In [8]: df1
Out[8]:
   Cust Referer  datediff  row_number
0     a    URL1         1           1
1     a    URL1         3           2
2     a    URL1         9           3
3     a    URL1        26           4
4     a    URL1        30           5
5     b    URL3         1           1
6     b    URL3         2           2
7     b    URL3         7           3
8     b    URL3        10           4
9     c    URL1         5           1
10    d    URL1         7           1

answered May 25, 2016 at 7:32

MaxU - stand with Ukraine

212k37 gold badges402 silver badges437 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Xue Over a year ago

Great MaxU, thank a lot! I need to learn transform() and apply()!

MaxU - stand with Ukraine Over a year ago

@Xue, always glad to help!

Collectives™ on Stack Overflow

Use Loop to modify Pandas Dataframe based on condition in data

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related