3

I did not figure out how to solve the following question! consider the following data set:

df = pd.DataFrame(data=np.array([['a',1, 2, 3], ['a',4, 5, 6],
                                 ['b',7, 8, 9], ['b',10, 11 , 12]]),
 columns=['id','A', 'B', 'C'])

  id   A    B    C
  a    1    2    3
  a    4    5    6
  b    7    8    9
  b    10   11   12

I need to group the data by id and in each group duplicate the first row and add it to the dataset like the following data set:

  id   A    B    C    A  B  C
  a    1    2    3    1  2  3
  a    4    5    6    1  2  3
  b    7    8    9    7  8  9
  b    10   11   12   7  8  9

I really appreciate it for your help.

I did the following steps, however I could not expand it :

df1 = df.loc [0:0 , 'A' :'C']
df3 = pd.concat([df,df1],axis=1)

3 Answers 3

5

Use groupby + first, and then concatenate df with this result:

v = df.groupby('id').transform('first')
pd.concat([df, v], 1)

  id   A   B   C  A  B  C
0  a   1   2   3  1  2  3
1  a   4   5   6  1  2  3
2  b   7   8   9  7  8  9
3  b  10  11  12  7  8  9
Sign up to request clarification or add additional context in comments.

Comments

3

cumcount + where+ffill

v=df.groupby('id').cumcount()==0

pd.concat([df,df.iloc[:,1:].where(v).ffill()],1)
Out[57]: 
  id   A   B   C  A  B  C
0  a   1   2   3  1  2  3
1  a   4   5   6  1  2  3
2  b   7   8   9  7  8  9
3  b  10  11  12  7  8  9

Comments

2

One can also try drop_duplicates and merge.

df_unique = df.drop_duplicates("id")
df.merge(df_unique, on="id", how="left")

    id  A_x     B_x     C_x     A_y     B_y     C_y
0   a   1       2       3       1       2       3
1   a   4       5       6       1       2       3
2   b   7       8       9       7       8       9
3   b   10      11      12      7       8       9

8 Comments

this one should be fast :-)
thanks for all response. I was wondering if I need to duplicate the second row, can I use this solution again? why drop_duplicate remove the second column? is there any way to remove the selected rows by drop_duplicate! @Tai
@AlterNative here I passed in id to specify that I want to detect duplicates by this label. By default only the first row will be kept by drop_duplicates
@AlterNative There are only a few choices that you can choose from. See Here for the keep parameter.
@AlterNative what potential selections you want to do? If too complicated, I think you can just create a new df yourself with suitable mapping between id and A, B, C and just perform merge later with that df.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.