0

I have this dataframe:

    0       1       2         3
0   Frank   48.2    test_1    file_1
1   John    46.7    test_1    file_1
2   Alice   39.3    test_2    file_2
3   Kim     35.6    test_2    file_2
4   Sasha   25.5    test_3    file_3
.... 
2306 rows × 4 columns   

I want that for every different value on the column 2 (there are 140 different values), it will be added a row in my dataframe before the first row with that value, keeping the file_number value in the column 3 (I will need that column for saving the dataframe splitted in different files depending on the value in it), like this:

    0        1       2       3
0   test_1                   file_1
1   Frank    48.2    test_1  file_1
2   John     46.7    test_1  file_1
3   test_2                   file_2
4   Alice    39.3    test_2  file_2
5   Kim      35.6    test_2  file_2
6   test_3                   file_3
7   Sasha    25.5    test_3  file_3
....

Which is the simplest way to achieve it? Thank you for your time!

2 Answers 2

2

You can check with drop_duplicates, then concat them back

s = df.drop_duplicates(['2','3']).drop(['0','1'],axis=1).rename({'2':'0'},axis=1)
out = pd.concat([s,df]).sort_index().reindex(columns=df.columns)
out
Out[15]: 
        0     1       2       3
0  test_1   NaN     NaN  file_1
0   Frank  48.2  test_1  file_1
1    John  46.7  test_1  file_1
2  test_2   NaN     NaN  file_2
2   Alice  39.3  test_2  file_2
3     Kim  35.6  test_2  file_2
4  test_3   NaN     NaN  file_3
4   Sasha  25.5  test_3  file_3
Sign up to request clarification or add additional context in comments.

2 Comments

This solution seems to work! But sometimes it doesn't put the new row on the top of the unique value but after the first row with that value
@Radix pd.concat([s,df]).sort_index(kind = 'stable’).reindex(columns=df.columns)
1

You can filter the rows with the correct value of column 2, add to that DataFrame the row you want, and concatenate all the DataFrames obtained into one. An example is the following code:

import pandas as pd

df = <READ_YOUR_DF>
all_df = []
for i in df["2"].unique():
        new_df = pd.DataFrame(data= {"0": [i], "1":[""],"2":[""], "3":[""]})
        filter_df = df[df["2"] == i]
        to_add = pd.concat([new_df, filter_df], ignore_index=True)
        all_df.append(to_add)

result_df=pd.concat(all_df, ignore_index=True)

If you want to avoid listing all the column names when creating new_df you can use a dictionary comprehension that uses as key the iteration over df.columns

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.