3

I'm trying to work with a script a colleague wrote.

This part of the script is working fine:

xl = pd.ExcelFile(path + WQ_file)
sheet_names = xl.sheet_names

df = pd.read_excel(path + WQ_file, sheetname = 'Chemistry Output Table', skiprows = [0,1,2,4,5,6,7], 
               index_col = [0,1], na_values = ['', 'na', '-'])
df.index.names = ['Field_ID', 'Date_Time']

header = pd.read_excel(path + WQ_file, sheetname = 'header data',  
               index_col = [0], na_values = ['', 'na', ' - '])
header_dict = {ah: header['name_short'].loc[ah] for ah in header.index}

analytes_excel = pd.read_excel(path + WQ_file, sheetname = 'analytes', columns = 'name')
analytes_list = [item for sublist in analytes_excel.values.tolist() for item in sublist]
analytes = [header['name_short'].loc[x] for x in analytes_list]    

But this part isn't:

# Clean up the data and report "less than" as half of the LOR
df2 = df.copy()
for col in df2.columns:
x = []
for (a, b) in df2[col].items():
    if b == " - ":
        b = np.nan
    try:
        b = float(b)
    except:
        b = float(b.strip('< '))/2
    x.append(b)
df2[col] = x

I get the following error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-4-80ad8c096fc0> in <module>()
  4 for col in df2.columns:
  5     x = []
 ----> 6     for (a, b) in df2[col].items():
  7         if b == " - ":
  8             b = np.nan

 C:\Users\SardellaC\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\generic.pyc in __getattr__(self, name)
 1938 
 1939         if name in self._internal_names_set:
-> 1940             return object.__getattribute__(self, name)
 1941         elif name in self._metadata:
 1942             return object.__getattribute__(self, name)

 AttributeError: 'Series' object has no attribute 'items'

It might be something to do with different versions of Python used. I'm not familiar at all with Python and would appreciate if someone can point me in the right direction.

2
  • df2[col] looks like this: Commented Jul 16, 2015 at 3:23
  • Field_ID Date_Time AST2 2014-12-29 00:00:00 2.3 2014-12-29 12:00:00 NaN 2015-01-12 00:00:00 3.2 2015-01-12 15:00:00 NaN 2015-01-28 00:00:00 2.8 2015-01-28 12:15:00 NaN 2015-01-28 12:30:00 NaN 2015-02-02 00:00:00 2.7 2015-02-02 11:30:00 NaN 2015-02-03 00:00:00 2.7 Commented Jul 16, 2015 at 3:23

1 Answer 1

7

Use iteritems() instead of items() while iterating through a pandas series as

for (a, b) in df2[col].iteritems():
    x = []
    ....

But iterating through each row is a very slow process for large data set. You can simply that part of code by using .apply() function. Let me know if you need to simplify the code.

Sign up to request clarification or add additional context in comments.

1 Comment

That's working fine @Kathirmani Sukumar. Thanks for your help!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.