5

I have a DataFrame:

df = pd.DataFrame({'B':[2,1,2],'C':['a','b','a']})
  B C
0 2 'a'
1 1 'b'
2 2 'a'

I want to insert a row above any occurrence of 'b', that is a duplicate of that row but with 'b' changed to 'c', so I end up with this:

  B C
0 2 'a'
1 1 'b'
1 1 'c'
2 2 'a'

For the life of me, I can't figure out how to do this.

6
  • 2
    you said above but in output its below, your first df produces a instead of c in third row. Commented Aug 31, 2016 at 17:09
  • 2
    What if there are two consecutive rows with b? Commented Aug 31, 2016 at 17:14
  • @shivsn sorry, typo Commented Aug 31, 2016 at 17:18
  • @Divakar, that's unlikely to happen, but if it did, then I would just insert a 'c' row above each one of the 'b' rows Commented Aug 31, 2016 at 17:18
  • Are you happy with a loop? Commented Aug 31, 2016 at 17:32

2 Answers 2

5

Here's one way of doing it:

duplicates = df[df['C'] == 'b'].copy()
duplicates['C'] = 'c'
df.append(duplicates).sort_index()
Sign up to request clarification or add additional context in comments.

1 Comment

Beautiful, didn't realize there was a copy function!
1

Working at NumPy level, here's a vectorized approach -

arr = df.values
idx = np.flatnonzero(df.C=='b')
newvals = arr[idx]
newvals[:,df.columns.get_loc("C")] = 'c'
out = np.insert(arr,idx+1,newvals,axis=0)
df_index = np.insert(np.arange(arr.shape[0]),idx+1,idx,axis=0)
df_out = pd.DataFrame(out,index=df_index)

Sample run -

In [149]: df
Out[149]: 
   B  C
0  2  a
1  1  b
2  2  d
3  4  d
4  3  b
5  8  a
6  4  a
7  2  b

In [150]: df_out
Out[150]: 
   0  1
0  2  a
1  1  b
1  1  c
2  2  d
3  4  d
4  3  b
4  3  c
5  8  a
6  4  a
7  2  b
7  2  c

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.