Add row in dataframe with same value as in specific column

Question

I have this dataframe:

    0       1       2         3
0   Frank   48.2    test_1    file_1
1   John    46.7    test_1    file_1
2   Alice   39.3    test_2    file_2
3   Kim     35.6    test_2    file_2
4   Sasha   25.5    test_3    file_3
.... 
2306 rows × 4 columns

I want that for every different value on the column 2 (there are 140 different values), it will be added a row in my dataframe before the first row with that value, keeping the file_number value in the column 3 (I will need that column for saving the dataframe splitted in different files depending on the value in it), like this:

    0        1       2       3
0   test_1                   file_1
1   Frank    48.2    test_1  file_1
2   John     46.7    test_1  file_1
3   test_2                   file_2
4   Alice    39.3    test_2  file_2
5   Kim      35.6    test_2  file_2
6   test_3                   file_3
7   Sasha    25.5    test_3  file_3
....

Which is the simplest way to achieve it? Thank you for your time!

BENY · Accepted Answer · 2021-10-11 13:53:20Z

2

You can check with drop_duplicates, then concat them back

s = df.drop_duplicates(['2','3']).drop(['0','1'],axis=1).rename({'2':'0'},axis=1)
out = pd.concat([s,df]).sort_index().reindex(columns=df.columns)
out
Out[15]: 
        0     1       2       3
0  test_1   NaN     NaN  file_1
0   Frank  48.2  test_1  file_1
1    John  46.7  test_1  file_1
2  test_2   NaN     NaN  file_2
2   Alice  39.3  test_2  file_2
3     Kim  35.6  test_2  file_2
4  test_3   NaN     NaN  file_3
4   Sasha  25.5  test_3  file_3

answered Oct 11, 2021 at 13:53

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Radix Over a year ago

This solution seems to work! But sometimes it doesn't put the new row on the top of the unique value but after the first row with that value

BENY Over a year ago

@Radix pd.concat([s,df]).sort_index(kind = 'stable’).reindex(columns=df.columns)

trolloldem · Accepted Answer · 2021-10-11 13:55:47Z

1

You can filter the rows with the correct value of column 2, add to that DataFrame the row you want, and concatenate all the DataFrames obtained into one. An example is the following code:

import pandas as pd

df = <READ_YOUR_DF>
all_df = []
for i in df["2"].unique():
        new_df = pd.DataFrame(data= {"0": [i], "1":[""],"2":[""], "3":[""]})
        filter_df = df[df["2"] == i]
        to_add = pd.concat([new_df, filter_df], ignore_index=True)
        all_df.append(to_add)

result_df=pd.concat(all_df, ignore_index=True)

If you want to avoid listing all the column names when creating new_df you can use a dictionary comprehension that uses as key the iteration over df.columns

answered Oct 11, 2021 at 13:55

trolloldem

8196 silver badges10 bronze badges

Collectives™ on Stack Overflow

Add row in dataframe with same value as in specific column

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related