I have a table with 150.000 rows and 15 columns. Important columns for this example are COUNTRY, COSTCENTER and EXTENSION. I am reading from a CSV into a Pandas Dataframe. All columns are of type object.
What I want to do is:
- Search for a certain COUNTRY (e.g. "China")
- Filter for these instances where the COSTCENTER is either 1000 or 2000 or where an EXTENSION starts with "862"
- Once all filters have been applied, change the country name in COUNTRY to something new.
I had a solution, but I always got the warning for a chaining issue:
df.COUNTRY[df.COUNTRY.str.match("China") &
(df.COSTCENTER.str.match("1000") |
df.COSTCENTER.str.match("2000"))] = 'China_new_name'
I cannot say, I understood completely, why I could have problems here, but I was looking for an alternative. I was trying with lambda and apply, but I kept getting all sorts of errors.
My latest approach now was:
filter_China = df.ix[(df["COUNTRY"]=="China") &
((df["COSTCENTER"]=="1000") | (df["COSTCENTER"]=="2000"))]
and it seems to filter, what I am looking for (I did not include the search on EXTENSION yet, as I first wanted this to work).
But when I am trying to change a value, based on my search criteria, I am running into trouble:
df.ix[(df["COUNTRY"]=="China") & ((df["COSTCENTER"]=="1000") |
(df["COSTCENTER"]=="2000")), df["COUNTRY"]] = "China_new_name"
I am getting this error: raise KeyError('%s not in index' % objarr[mask])
What am I missing here? Is the approach the right one or would I need to go a total different route?