2

I have a pandas data frame that looks something like this:

hotel_id         date         length_of_stay     clicks
A               2019-01-01           3               7
B               2019-01-06           2               11
C               2019-01-03           1               4

and I want the result to be:

hotel_id         date                            clicks
A               2019-01-01                          7
A               2019-01-02                          7
A               2019-01-03                          7
B               2019-01-06                          11
B               2019-01-07                          11
C               2019-01-03                          4

So we see how many clicks we got for each night someone stayed at the hotel...

I cant think of an elegant way to do this.. can someone help?

1

2 Answers 2

4

Use numpy.repeat():

m= pd.DataFrame(np.repeat(df.values,df.length_of_stay,axis=0),columns=df.columns)
m['date']=m.groupby('hotel_id')['date'].transform(lambda x: pd.date_range(start=x.iloc[0], periods=len(x)))

or:

newdf = pd.DataFrame(np.repeat(df.values,df.length_of_stay,axis=0),columns=df.columns)
newdf['date'] = [i for day, n in zip(df.date,df.length_of_stay) 
                   for i in pd.date_range(start=day, periods=n)]

Full example:

import pandas as pd
import numpy as np

data = '''\
hotel_id         date         length_of_stay     clicks
A               2019-01-01           3               7
B               2019-01-06           2               11
C               2019-01-03           1               4'''

fileobj = pd.compat.StringIO(data)
df = pd.read_csv(fileobj, parse_dates=['date'], sep='\s+')

m= pd.DataFrame(np.repeat(df.values,df.length_of_stay,axis=0),columns=df.columns)
m['date']=m.groupby('hotel_id')['date'].transform(lambda x: pd.date_range(start=x.iloc[0], periods=len(x)))
print(m)

  hotel_id       date length_of_stay clicks
0        A 2019-01-01              3      7
1        A 2019-01-02              3      7
2        A 2019-01-03              3      7
3        B 2019-01-06              2     11
4        B 2019-01-07              2     11
5        C 2019-01-03              1      4
Sign up to request clarification or add additional context in comments.

5 Comments

No worries. Just adding info to make the answer easier to understand. You did all the work +1.
@AntonvBR absolutely ..!! agreed "D
Thank you both so much! Very elegant solution :)
Thanks @AntonvBR! I was thinking of something along the lines of your answer but couldn't quite put it altogether
@Tim496 By my solution do you mean the zip or iterrows? The itterows is the "quick and dirty" way of solving things in my opinion.
2

Here is a different solution making use of the "ugly" df.iterrows():

newdf = pd.concat(pd.DataFrame({
        'hotel_id': row['hotel_id'],
        'date': pd.date_range(start=row['date'], periods=row['length_of_stay']),
        'length_of_stay': row['length_of_stay'],
        'clicks': row['clicks']
    }) for ind, row in df.iterrows())

Full example:

import pandas as pd

data = '''\
hotel_id         date         length_of_stay     clicks
A               2019-01-01           3               7
B               2019-01-06           2               11
C               2019-01-03           1               4'''

fileobj = pd.compat.StringIO(data)
df = pd.read_csv(fileobj, parse_dates=['date'], sep='\s+')

newdf = pd.concat(pd.DataFrame({
    'hotel_id': row['hotel_id'],
    'date': pd.date_range(start=row['date'], periods=row['length_of_stay']),
    'length_of_stay': row['length_of_stay'],
    'clicks': row['clicks']
}) for ind, row in df.iterrows())

Returns:

   clicks       date hotel_id  length_of_stay
0       7 2019-01-01        A               3
1       7 2019-01-02        A               3
2       7 2019-01-03        A               3
0      11 2019-01-06        B               2
1      11 2019-01-07        B               2
0       4 2019-01-03        C               1

2 Comments

interesting. iterrows with elegence. :D
@anky_91 I just thought of it and thought: why not post it. The thing is quite easy to follow at least :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.