I have a DataFrame like this:
data = {'col1': ['A', 'B', 'B', 'A', 'B', 'C', 'B', 'B', 'B',
'A', 'C', 'A', 'B', 'C'],
'col2': ['NaN', 'comment1', 'comment2', 'NaN', 'comment3', NaN,
'comment4', 'comment5', 'comment6',
'NaN', 'NaN', 'NaN', 'comment7', 'NaN]}
frame = pd.DataFrame(data)
frame
col1 col2
A NaN
B comment1
B comment2
A NaN
B comment3
C NaN
B comment4
B comment5
B comment6
A NaN
C NaN
A NaN
B comment7
C NaN
Each row with col1 == 'B' has a comment which will be a string. I need to aggregate the comments and fill the preceding row (where col1 != 'B') with the resulting aggregated string.
Any given row where col1 != 'B' could have none, one or many corresponding rows of comments (col1 == 'B') which seems to be the crux of the problem. I can't just use fillna('bfill') etc.
I have looked into iterrows(), groupby(), while loops and tried to build my own function. But, I don't think I'm fully understanding how all of those are working.
Finished product should look like this:
col1 col2
A comment1 + comment2
B comment1
B comment2
A comment3
B comment3
C comment4 + comment5 + comment6
B comment4
B comment5
B comment6
A NaN
C NaN
A comment7
B comment7
C NaN
Eventually I will be dropping all rows where col1 == 'B', but for now I'd like to keep them for verification.