KeyError: Not in index, using a keys generated from a Pandas dataframe on itself

Question

I have two columns in a Pandas DataFrame that has datetime as its index. The two column contain data measuring the same parameter but neither column is complete (some row have no data at all, some rows have data in both column and other data on in column 'a' or 'b').

I've written the following code to find gaps in columns, generate a list of indices of dates where these gaps appear and use this list to find and replace missing data. However I get a KeyError: Not in index on line 3, which I don't understand because the keys I'm using to index came from the DataFrame itself. Could somebody explain why this is happening and what I can do to fix it? Here's the code:

def merge_func(df):
    null_index = df[(df['DOC_mg/L'].isnull() == False) & (df['TOC_mg/L'].isnull() == True)].index
    df['TOC_mg/L'][null_index] = df[null_index]['DOC_mg/L']
    notnull_index = df[(df['DOC_mg/L'].isnull() == True) & (df['TOC_mg/L'].isnull() == False)].index
    df['DOC_mg/L'][notnull_index] = df[notnull_index]['TOC_mg/L']

    df.insert(len(df.columns), 'Mean_mg/L', 0.0)
    df['Mean_mg/L'] = (df['DOC_mg/L'] + df['TOC_mg/L']) / 2
    return df

merge_func(sve)

does it work if you do: df.loc[null_index,'TOC_mg/L']=df['DOC_mg/L'] — EdChum
– EdChum, Commented Jun 11, 2014 at 10:28

EdChum · Accepted Answer · 2014-06-11 12:16:39Z

3

Whenever you are considering performing assignment then you should use .loc:

df.loc[null_index,'TOC_mg/L']=df['DOC_mg/L']

The error in your original code is the ordering of the subscript values for the index lookup:

df['TOC_mg/L'][null_index] = df[null_index]['DOC_mg/L']

will produce an index error, I get the error on a toy dataset: IndexError: indices are out-of-bounds

If you changed the order to this it would probably work:

df['TOC_mg/L'][null_index] = df['DOC_mg/L'][null_index]

However, this is chained assignment and should be avoided, see the online docs

So you should use loc:

df.loc[null_index,'TOC_mg/L']=df['DOC_mg/L']
df.loc[notnull_index, 'DOC_mg/L'] = df['TOC_mg/L']

note that it is not necessary to use the same index for the rhs as it will align correctly

answered Jun 11, 2014 at 12:16

EdChum

397k204 gold badges837 silver badges583 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

KeyError: Not in index, using a keys generated from a Pandas dataframe on itself

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related