9

I have an Excel report with several tables arranged in the sheet and I'm parsing it with Pandas. The key,value pairs I'm scraping out of the report are always in the same columns. So, I separted my lookups into groups where the key,values are the same, and use iloc to find the correct row:

df[df.iloc[:, key_column] == 'apple'][value_column].values[0]

Many keys are present in every file, but occasionally one is not present. In the rare event of an always-present-key not being present the whole block will fail (index 0 is out of bounds for axis 0 with size 0)

try:
  parsed_xls['fruit'] = df[df.iloc[:, key_column] == 'apple'][value_column].values[0]
  parsed_xls['vegetable'] = df[df.iloc[:, key_column] == 'onion'][value_column].values[0]
  parsed_xls['stationary'] = df[df.iloc[:, key_column] == 'stapler'][value_column].values[0]
except:
  # error reporting

Short of putting each key,value pair in it's own try...except, or a helper function to supply zero value when the key search fails... Is there a more Pandas-like way to handle iloc lookups which raise this exception (and still catch errors)?

2
  • Just to clarify, is it key_column that may not be present or value_column? Or is it just possible that there may not be any such key present in the key column? Which is it? Commented Mar 10, 2018 at 21:55
  • The key may not be present. Eg. If only food is present when the report is generated, and there are to 'staplers' to report, then the 'stapler' key is not present. Commented Mar 10, 2018 at 23:19

1 Answer 1

6

The short answer is "No" - and I see no reason why such functionality should exist when you can wrap your logic in a helper function.

If, as you mention, you only occasionally see IndexError, try / except is preferred to if / else.

import pandas as pd, numpy as np

df = pd.DataFrame(np.random.randint(0, 9, (1000, 10)))

res = df.loc[df.iloc[:, 20] == 6, 5].values[0]
# IndexError: index 0 is out of bounds for axis 0 with size 0

def lookup_fn(df, key_col, key_val, val_col, idx=0):
    try:
        return df[df.iloc[:, key_col] == key_val][val_col].values[idx]
    except IndexError:
        return 0

res = lookup_fn(df, 20, 6, 5)
# 0
Sign up to request clarification or add additional context in comments.

2 Comments

I'm just trying to get my head around writing Pandas that isn't overly complex just for these outliers. In this daily report there are both extremes--the infrequent missing, and the infrequent addition. Over 2 years working with the data I've seen additions <10 per year. I have never seen some values missing, but I have to admit its possible. I honestly didn't know if there was a common way to catch this exception. I like the helper solution.
I advise you to use try / except unless you see a performance drop (usually too many exceptions). If this occurs, you can easily move to an if / else statement. Something like: if key_col in range(len(df.columns)):...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.