2

I am looking for the correct logic to combine two columns with related data from an .xlsx file using pandas in python. It is similar to the post: Merge 2 columns in pandas into one columns that have data in python, except that I also want to transform the data as I combine the columns so it's not really a true merge of the two columns. I want to be able to say "if column wbc_na has the value "checked" in row x, place "Not available" in row x under column wbc". Once combined, I want to drop the column" wbc_na" since "wbc" now contains all the information I need. For example:

input:  
ID,wbc, wbc_na  
1,9.0,-  
2,NaN,checked  
3,10.2,-  
4,8.8,-  
5,0,checked  

output:

ID,wbc  
1,9.0  
2,Not available  
3,10.2  
4,8.8  
5,Not available  

Thanks for your suggestions.

2 Answers 2

2

You can use loc to find where column 'wbc_na' is 'checked' and for those rows assign column 'wbc' value:

In [18]:
df.loc[df['wbc_na'] == 'checked', 'wbc'] = 'Not available'
df
Out[18]:
   ID            wbc   wbc_na
0   1              9      -  
1   2  Not available  checked
2   3           10.2      -  
3   4            8.8      -  
4   5  Not available  checked

[5 rows x 3 columns]
In [19]:
# now drop the extra column
df.drop(labels='wbc_na', axis=1, inplace=True)
df
Out[19]:
   ID            wbc
0   1              9
1   2  Not available
2   3           10.2
3   4            8.8
4   5  Not available

[5 rows x 2 columns]
Sign up to request clarification or add additional context in comments.

3 Comments

Thank you, this answer seems to give the result I was looking for. So the loc method is basically returning the row # correct?
loc performs label based selection, the online docs are worth looking at to understand the semantic differences between loc, ix and iloc, by the way you should have enough reputation to upvote now ;)
I was looking at (pandas.pydata.org/pandas-docs/stable/10min.html) but it wasn't giving me enough info about the methods. How exciting that I can upvote things now, I was just waiting to be able to do that. Thanks for your help!
1

You could also a list comprehension to reassign the values in column wbc:

data = pd.DataFrame({'ID': [1,2,3,4,5], 'wbc': [9, np.nan, 10, 8, 0], 'wbc_nan': ['-', 'checked', '-', '-', 'checked']})
data['wbc'] = [(item if data['wbc_nan'][x] != 'checked' else 'Not available') for x, item in enumerate(data['wbc'])]
data = data.drop('wbc_nan', axis=1)

1 Comment

Tried this out, it also works with my real data. Thank you for your solution.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.