3

I'm working with a huge dataframe in python and sometimes I need to add an empty row or several rows in a definite position to dataframe. For this question I create a small dataframe df in order to show, what I want to achieve.

>  df = pd.DataFrame(np.random.randint(10, size = (3,3)), columns =
> ['A','B','C'])
>        A  B  C
>     0  4  5  2
>     1  6  7  0
>     2  8  1  9

Let's say I need to add an empty row, if I have a zero-value in the column 'C'. Here the empty row should be added after the second row. So at the end I want to have a new dataframe like:

>new_df
>        A    B    C
>     0  4    5    2
>     1  6    7    0
>     2  nan  nan  nan
>     3  8    1    9

I tried with concat and append, but I didn't get what I want to. Could you help me please?

4 Answers 4

4

You can try in this way:

l = df[df['C']==0].index.tolist()
for c, i in enumerate(l):
    dfs = np.split(df, [i+1+c])
    df = pd.concat([dfs[0], pd.DataFrame([[np.NaN, np.NaN, np.NaN]], columns=df.columns), dfs[1]], ignore_index=True)
print df

Input:

   A  B  C
0  4  3  0
1  4  0  4
2  4  4  2
3  3  2  1
4  3  1  2
5  4  1  4
6  1  0  4
7  0  2  0
8  2  0  3
9  4  1  3

Output:

    A    B    C
0   4.0  3.0  0.0
1   NaN  NaN  NaN
2   4.0  0.0  4.0
3   4.0  4.0  2.0
4   3.0  2.0  1.0
5   3.0  1.0  2.0
6   4.0  1.0  4.0
7   1.0  0.0  4.0
8   0.0  2.0  0.0
9   NaN  NaN  NaN
10  2.0  0.0  3.0
11  4.0  1.0  3.0

Last thing: it can happen that the last row has 0 in 'C', so you can add:

if df["C"].iloc[-1] == 0 :
    df.loc[len(df)] = [np.NaN, np.NaN, np.NaN]
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you very much. It works also with my real dataframe
2

Try using slice.

First, you need to find the rows where C == 0. So let's create a bool df for this. I'll just name it 'a':

a = (df['C'] == 0)

So, whenever C == 0, a == True.

Now we need to find the index of each row where C == 0, create an empty row and add it to the df:

df2 = df.copy() #make a copy because we want to be safe here
for i in df.loc[a].index:
    empty_row = pd.DataFrame([], index=[i]) #creating the empty data
    j = i + 1 #just to get things easier to read
    df2 = pd.concat([df2.ix[:i], empty_row, df2.ix[j:]]) #slicing the df

df2 = df2.reset_index(drop=True) #reset the index

I must say... I don't know the size of your df and if this is fast enough, but give it a try

1 Comment

Thank you very much for the code and for your detailed explanation. It works!
1

In case you know the index where you want to insert a new row, concat can be a solution.

Example dataframe:

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})
#    A  B  C
# 0  1  4  7
# 1  2  5  8
# 2  3  6  9

Your new row as a dataframe with index 1:

new_row = pd.DataFrame({'A': np.nan, 'B': np.nan,'C': np.nan}, index=[1])

Inserting your new row after the second row:

new_df = pd.concat([df.loc[:1], new_row, df.loc[2:]]).reset_index(drop=True)
#      A    B    C
# 0  1.0  4.0  7.0
# 1  2.0  5.0  8.0
# 2  NaN  NaN  NaN
# 3  3.0  6.0  9.0

Comments

0

something like this should work for you:

for key, row in df.iterrows():
    if  row['C'] == 0:
        df.loc[key+1] = pd.Series([np.nan])

1 Comment

using for loop may not be a good choice as it was mentioned that data is huge.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.