duplicate specific rows of a dataframe based on column values

Question

hi I have the following data frame

  weather     day      month       activity
  sunny     Monday     April      go for cycling
  raining   Friday     December   stay home

what I want is to duplicate the rows by 5 times without taking into account the activity column

so the output should be

weather     day      month       activity
 sunny     Monday     April      go for cycling
 sunny     Monday     April
 sunny     Monday     April
 sunny     Monday     April
 sunny     Monday     April
 raining   Friday     December   stay home
 raining   Friday     December
 raining   Friday     December
 raining   Friday     December
 raining   Friday     December
 raining   Friday     December

jezrael · Accepted Answer · 2021-07-13 08:49:12Z

5

Use Index.repeat with DataFrame.loc for repeated rows and then replace duplicated activity by Series.mask with Index.duplicated:

df = df.loc[df.index.repeat(5)]
df['activity'] = df['activity'].mask(df.index.duplicated(), '')
df = df.reset_index(drop=True)
print (df)
   weather     day     month        activity
0    sunny  Monday     April  go for cycling
1    sunny  Monday     April                
2    sunny  Monday     April                
3    sunny  Monday     April                
4    sunny  Monday     April                
5  raining  Friday  December       stay home
6  raining  Friday  December                
7  raining  Friday  December                
8  raining  Friday  December                
9  raining  Friday  December

answered Jul 13, 2021 at 8:49

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

loutsi1 Over a year ago

with this code the activity column is also duplicated 5 times

jezrael Over a year ago

@loutsi1 - yes, in pandas not possible 2 rows there.

jezrael Over a year ago

@loutsi1 - in pandas each DataFrame has columns with same length, so here all columns has length 10.

loutsi1 Over a year ago

yes I saw what you saying. So now have 10 values on the other columns and two values on the activity column. The remaining rows of the activity column are fulfilling with nan values?

jezrael Over a year ago

@loutsi1 - No, here are repalced by empty string defined by .mask(df.index.duplicated(), ''), if need missing value use .mask(df.index.duplicated(), np.nan)

Collectives™ on Stack Overflow

duplicate specific rows of a dataframe based on column values

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related