Adding duplicate rows to a DataFrame

Question

I did not figure out how to solve the following question! consider the following data set:

df = pd.DataFrame(data=np.array([['a',1, 2, 3], ['a',4, 5, 6],
                                 ['b',7, 8, 9], ['b',10, 11 , 12]]),
 columns=['id','A', 'B', 'C'])

  id   A    B    C
  a    1    2    3
  a    4    5    6
  b    7    8    9
  b    10   11   12

I need to group the data by id and in each group duplicate the first row and add it to the dataset like the following data set:

  id   A    B    C    A  B  C
  a    1    2    3    1  2  3
  a    4    5    6    1  2  3
  b    7    8    9    7  8  9
  b    10   11   12   7  8  9

I really appreciate it for your help.

I did the following steps, however I could not expand it :

df1 = df.loc [0:0 , 'A' :'C']
df3 = pd.concat([df,df1],axis=1)

cs95 · Accepted Answer · 2018-01-29 01:24:13Z

5

Use groupby + first, and then concatenate df with this result:

v = df.groupby('id').transform('first')
pd.concat([df, v], 1)

  id   A   B   C  A  B  C
0  a   1   2   3  1  2  3
1  a   4   5   6  1  2  3
2  b   7   8   9  7  8  9
3  b  10  11  12  7  8  9

answered Jan 29, 2018 at 1:24

cs95

406k106 gold badges744 silver badges797 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

BENY · Accepted Answer · 2018-01-29 01:50:53Z

3

cumcount + where+ffill

v=df.groupby('id').cumcount()==0

pd.concat([df,df.iloc[:,1:].where(v).ffill()],1)
Out[57]: 
  id   A   B   C  A  B  C
0  a   1   2   3  1  2  3
1  a   4   5   6  1  2  3
2  b   7   8   9  7  8  9
3  b  10  11  12  7  8  9

answered Jan 29, 2018 at 1:50

BENY

324k22 gold badges176 silver badges250 bronze badges

Comments

Tai · Accepted Answer · 2018-01-29 01:34:04Z

2

One can also try drop_duplicates and merge.

df_unique = df.drop_duplicates("id")
df.merge(df_unique, on="id", how="left")

    id  A_x     B_x     C_x     A_y     B_y     C_y
0   a   1       2       3       1       2       3
1   a   4       5       6       1       2       3
2   b   7       8       9       7       8       9
3   b   10      11      12      7       8       9

answered Jan 29, 2018 at 1:34

Tai

8,0643 gold badges31 silver badges50 bronze badges

8 Comments

BENY Over a year ago

this one should be fast :-)

Elham Over a year ago

thanks for all response. I was wondering if I need to duplicate the second row, can I use this solution again? why drop_duplicate remove the second column? is there any way to remove the selected rows by drop_duplicate! @Tai

Tai Over a year ago

@AlterNative here I passed in id to specify that I want to detect duplicates by this label. By default only the first row will be kept by drop_duplicates

Tai Over a year ago

@AlterNative There are only a few choices that you can choose from. See Here for the keep parameter.

Tai Over a year ago

@AlterNative what potential selections you want to do? If too complicated, I think you can just create a new df yourself with suitable mapping between id and A, B, C and just perform merge later with that df.

|

Collectives™ on Stack Overflow

Adding duplicate rows to a DataFrame

3 Answers 3

Comments

Comments

8 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

8 Comments

Your Answer

Sign up or log in

Post as a guest

Related