4

I'm working with a huge dataframe in python and sometimes I need to add an empty row or several rows in a definite position to dataframe. For this question I created a small dataframe df in order to show, what I want to achieve.

cars = {'Brand': ['Honda Civic','Toyota Corolla','Ford Focus','Audi A4'],
        'Price': [22000,25000,27000,35000]
        }

df = pd.DataFrame(cars, columns = ['Brand', 'Price'])

If a row value is 27000, I want to add an empty row before it. I can insert row after with Concat but I can't really think of a way of adding it before..

4 Answers 4

3

You can create a helper cumsum column for groupby then append a blank row only for the first group and then concat:

out = pd.concat((g.append(pd.Series(),ignore_index=True) if i==0 else g 
       for i, g in df.groupby(df['Price'].eq(27000).cumsum())))

print(out)

            Brand    Price
0     Honda Civic  22000.0
1  Toyota Corolla  25000.0
2             NaN      NaN
2      Ford Focus  27000.0
3         Audi A4  35000.0
Sign up to request clarification or add additional context in comments.

Comments

2

Create a DataFrame with the index labels based on your condition that has all null values. [Assumes df has a non-duplicated index]. Then concat and sort_index which will place the missing row before (because we concat df to empty). Then reset_index to remove the duplicate index labels.

import pandas as pd

empty = pd.DataFrame(columns=df.columns, index=df[df.Price.eq(27000)].index)
df = pd.concat([empty, df]).sort_index().reset_index(drop=True)
#            Brand  Price
#0     Honda Civic  22000
#1  Toyota Corolla  25000
#2             NaN    NaN
#3      Ford Focus  27000
#4         Audi A4  35000

This will add a blank row before every 27000 row

cars = {'Brand': ['Honda Civic','Toyota Corolla','Ford Focus','Audi A4','Jeep'],
        'Price': [22000,25000,27000,35000,27000]}
df = pd.DataFrame(cars, columns = ['Brand', 'Price'])

empty = pd.DataFrame(columns=df.columns, index=df[df.Price.eq(27000)].index)
df = pd.concat([empty, df]).sort_index().reset_index(drop=True)
#            Brand  Price
#0     Honda Civic  22000
#1  Toyota Corolla  25000
#2             NaN    NaN
#3      Ford Focus  27000
#4         Audi A4  35000
#5             NaN    NaN
#6            Jeep  27000

3 Comments

Good one..!! visionary (: I didnt think that price can repeat :/
@anky yeah I wasn't sure if "several rows" meant several rows in one position or several single rows in many positions. ¯_(ツ)_/¯
Practically saying this makes more sense than my answer :)
2

Let us try cummax with append:

m = df['Price'].eq(27000).cummax()
df[~m].append(pd.Series(), ignore_index=True).append(df[m])

            Brand    Price
0     Honda Civic  22000.0
1  Toyota Corolla  25000.0
2             NaN      NaN
2      Ford Focus  27000.0
3         Audi A4  35000.0

Comments

2

You can also do this by concat() method and apply() method:

result=pd.concat((df.apply(lambda x:np.nan if x['Price']==27000 else x,1),df))

Finally use sort_index() method,drop_duplicates() method and reset_index() method:

result=result.sort_index(na_position='first').drop_duplicates().reset_index(drop=True)

Now if you print result you will get your desired output:

    Brand           Price
0   Honda Civic     22000.0
1   Toyota Corolla  25000.0
2   NaN             NaN
3   Ford Focus      27000.0
4   Audi A4         35000.0

This will add a blank row before every row where Price=27000:

result=pd.concat((df.apply(lambda x:np.nan if x['Price']==27000 else x,1),df))

result=result.drop_duplicates().append(result[result.isna().all(1)].iloc[1:]).sort_index(na_position='first').reset_index(drop=True)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.