3

I have two columns in a Pandas DataFrame that has datetime as its index. The two column contain data measuring the same parameter but neither column is complete (some row have no data at all, some rows have data in both column and other data on in column 'a' or 'b').

I've written the following code to find gaps in columns, generate a list of indices of dates where these gaps appear and use this list to find and replace missing data. However I get a KeyError: Not in index on line 3, which I don't understand because the keys I'm using to index came from the DataFrame itself. Could somebody explain why this is happening and what I can do to fix it? Here's the code:

def merge_func(df):
    null_index = df[(df['DOC_mg/L'].isnull() == False) & (df['TOC_mg/L'].isnull() == True)].index
    df['TOC_mg/L'][null_index] = df[null_index]['DOC_mg/L']
    notnull_index = df[(df['DOC_mg/L'].isnull() == True) & (df['TOC_mg/L'].isnull() == False)].index
    df['DOC_mg/L'][notnull_index] = df[notnull_index]['TOC_mg/L']

    df.insert(len(df.columns), 'Mean_mg/L', 0.0)
    df['Mean_mg/L'] = (df['DOC_mg/L'] + df['TOC_mg/L']) / 2
    return df

merge_func(sve)
3
  • 1
    does it work if you do: df.loc[null_index,'TOC_mg/L']=df['DOC_mg/L'] Commented Jun 11, 2014 at 10:28
  • Do you want me to post as an answer? Commented Jun 11, 2014 at 12:05
  • Yes please, if you don't mind! Commented Jun 11, 2014 at 12:07

1 Answer 1

3

Whenever you are considering performing assignment then you should use .loc:

df.loc[null_index,'TOC_mg/L']=df['DOC_mg/L']

The error in your original code is the ordering of the subscript values for the index lookup:

df['TOC_mg/L'][null_index] = df[null_index]['DOC_mg/L']

will produce an index error, I get the error on a toy dataset: IndexError: indices are out-of-bounds

If you changed the order to this it would probably work:

df['TOC_mg/L'][null_index] = df['DOC_mg/L'][null_index]

However, this is chained assignment and should be avoided, see the online docs

So you should use loc:

df.loc[null_index,'TOC_mg/L']=df['DOC_mg/L']
df.loc[notnull_index, 'DOC_mg/L'] = df['TOC_mg/L']

note that it is not necessary to use the same index for the rhs as it will align correctly

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.