Python pandas: updating dataframe series values based on values contained in another dataframe

Question

I am using pandas with python and I have a dataframe data. I have another dataframe missing_vals. missing_vals contains a field column and a key column. The field column contains elements that correspond to names of the columns of data i.e data.columns ~= missing_vals['field']. The mapping, however, is not one-to-one (some entries in missing_vals['field'] do not exist in data.columns. I did a set intersection operation to take care of that and got an output array result containing all the values that are both in missing_vals['field'] and data.columns. Now I want to index into data using each element of result, check to see if that column contains the value corresponding to the element in missing_vals['key'] and replace it with NaN. I tried using for-loops, but I know this is not the ideal way to do it. Is there a way to do it with vector/lambda operations or perhaps with other dataframe functions? I am new to pandas so I would really appreciate some help.

Here is my code so far:

for i in range(len(result)): field = missing_vals['field'][i] for j in range(data[field].size): if (data[field][j] == missing_vals['key'][i]): data.replace(data[field][j], np.nan)

Thanks

Please provide some sample data. For example, data.head(10) and missing_vals.head(10). — Alexander
– Alexander, Commented Jun 14, 2015 at 5:56

JoeCondron · Accepted Answer · 2015-06-13 19:19:51Z

1

You should really post sample input/output - these things are difficult explain verbally. Anyway, I think the second loop can be done away with entirely. You really just have to do.

field = missing_vals['field'][i]
data[field].replace(missing_vals['key'][i], np.nan)

The replace method replace all occurances with the replacement value and if there are none it does nothing. It's unnecessary to loop through the columns yourself to check if the value to be replaced is there. If you post representative examples of the data frames in question I can probably help you more.

answered Jun 13, 2015 at 19:19

JoeCondron

8,9163 gold badges29 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

cavs Over a year ago

Sorry, I'm sort of new to this. Next time I'll post some screenshots, thanks for the tip. Anyways, your suggestion worked! I appreciate the help.

JoeCondron Over a year ago

Well you can edit the current question and then I can edit the answer. You can do it by screen shot or just by copy and pasting input/output from your interpreter. You could also vote me up if the answer helps :)

Collectives™ on Stack Overflow

Python pandas: updating dataframe series values based on values contained in another dataframe

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related