1

I have a data frame like this,

df:

col1      col2       col3
 1        cat          4
nan       dog         nan 
 3        tiger         3
 2        lion          9
 nan      frog         nan
 nan     elephant      nan

I want to create a data frame from this data frame that id there is nan values in col1, col2 values will be added to the previous row value.

for example the desired output data frame will be:

col1     col2             col3
 1      catdog             4
 3       tiger             3
 2     lionfrogelephant    9

How to do this using pandas ?

3
  • How working my solution? Commented Jan 9, 2019 at 10:22
  • yes, thanks , working Commented Jan 9, 2019 at 10:28
  • Thank you for accepting! Commented Jan 9, 2019 at 10:29

1 Answer 1

1

Use forward filling missing values and aggregate join:

cols = ['col1','col3']
df[cols] = df[cols].ffill()
df = df.groupby(cols)['col2'].apply(''.join).reset_index()
print (df)
   col1  col3              col2
0   1.0   4.0            catdog
1   2.0   9.0  lionfrogelephant
2   3.0   3.0             tiger

Or if necessary forward filling missing values in all columns:

df = df.ffill().groupby(['col1','col3'])['col2'].apply(''.join).reset_index()
print (df)
   col1  col3              col2
0   1.0   4.0            catdog
1   2.0   9.0  lionfrogelephant
2   3.0   3.0             tiger
Sign up to request clarification or add additional context in comments.

1 Comment

Mine is same problem, but data is not always missing in col1 or col3, sometime in col2 also. And when data is missing in col2, data is present in either col1 or col3 or in both. How to deal in this scenario. How to successfully attach present row data to previous row data?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.