0

hi I have the following data frame

  weather     day      month       activity
  sunny     Monday     April      go for cycling
  raining   Friday     December   stay home

what I want is to duplicate the rows by 5 times without taking into account the activity column

so the output should be

weather     day      month       activity
 sunny     Monday     April      go for cycling
 sunny     Monday     April
 sunny     Monday     April
 sunny     Monday     April
 sunny     Monday     April
 raining   Friday     December   stay home
 raining   Friday     December
 raining   Friday     December
 raining   Friday     December
 raining   Friday     December
 raining   Friday     December

1 Answer 1

5

Use Index.repeat with DataFrame.loc for repeated rows and then replace duplicated activity by Series.mask with Index.duplicated:

df = df.loc[df.index.repeat(5)]
df['activity'] = df['activity'].mask(df.index.duplicated(), '')
df = df.reset_index(drop=True)
print (df)
   weather     day     month        activity
0    sunny  Monday     April  go for cycling
1    sunny  Monday     April                
2    sunny  Monday     April                
3    sunny  Monday     April                
4    sunny  Monday     April                
5  raining  Friday  December       stay home
6  raining  Friday  December                
7  raining  Friday  December                
8  raining  Friday  December                
9  raining  Friday  December                
Sign up to request clarification or add additional context in comments.

5 Comments

with this code the activity column is also duplicated 5 times
@loutsi1 - yes, in pandas not possible 2 rows there.
@loutsi1 - in pandas each DataFrame has columns with same length, so here all columns has length 10.
yes I saw what you saying. So now have 10 values on the other columns and two values on the activity column. The remaining rows of the activity column are fulfilling with nan values?
@loutsi1 - No, here are repalced by empty string defined by .mask(df.index.duplicated(), ''), if need missing value use .mask(df.index.duplicated(), np.nan)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.