0

So I've managed to write a code in pandas to do the data analysis I need and export to a new .xlsx file. Which is awesome, except it's for one file and I typically have 40+ files I want to run this on.

Through research I managed to at least get it to read the file names in the folder, but I am at a loss on how to implement into my existing code.

Goal: To run code over each .xlsx file in folder and spit out the analyzed data as new .xlsx files.

For now here is the code I came up with to read the folder:

import os
import glob

os.chdir('C:/Users/PCTR261010/Desktop/PartReviewExport')
FileList = glob.glob('*.xlsx')
print(FileList)

Here is a snippet of the import section of my larger code file:

import os
import glob
import pandas as pd

# Prints header information in Part Scorecard
df = pd.read_excel('GAT_US_PartReview_2017-06-23.xlsx', header=None, 
skipinitialspace=True, skiprows=1)
header = df.head(5).filter([0,2], axis=1)

# Begins Data Analysis of Part Scorecard
fields = ['Appl Req', 'Appl Count ', 'Intr Req', 'Intr Count ', 'OE Intr 
Req', 'Has OE Intr', 'Has Attr Editor',
      'Part IMG Req', 'Has Part IMG', 'Has MPCC', 'Warr Req', 'Has Warr 
TXT', 'Has Warr PDF', 'MSDS Req',
      'Has MSDS', 'UPC Req', 'Has UPC', 'Has UNSPSC', 'Valid Part']

df = pd.read_excel('GAT_US_PartReview_2017-06-23.xlsx', 
skipinitialspace=True, skiprows=7, usecols=fields,
               dtype=str)

Any help is appreciated!!

2 Answers 2

1

You can iterate over each of the file names, passing them to pandas

import os
import glob
import pandas as pd

os.chdir('C:/Users/PCTR261010/Desktop/PartReviewExport')
FileList = glob.glob('*.xlsx')
print(FileList)


for fname in FileList:  
    # Prints header information in Part Scorecard
    df = pd.read_excel(fname, header=None, 
    skipinitialspace=True, skiprows=1)
    header = df.head(5).filter([0,2], axis=1)

    # Begins Data Analysis of Part Scorecard
    fields = ['Appl Req', 'Appl Count ', 'Intr Req', 'Intr Count ', 'OE Intr 
    Req', 'Has OE Intr', 'Has Attr Editor',
          'Part IMG Req', 'Has Part IMG', 'Has MPCC', 'Warr Req', 'Has Warr 
    TXT', 'Has Warr PDF', 'MSDS Req',
          'Has MSDS', 'UPC Req', 'Has UPC', 'Has UNSPSC', 'Valid Part']

    df = pd.read_excel(fname, skipinitialspace=True, 
                       skiprows=7, usecols=fields, dtype=str)

    # ... analysis here ...
    df.to_excel('out_' + fname)
Sign up to request clarification or add additional context in comments.

3 Comments

Does this same concept need to be applied under #Prints header information in Part Scorecard? Because I am calling a specific file name there
only if you want to print the header for each file.
Thanks James, not I just have to figure out why I keep getting 'ValueError: 'Intr Req' is not in list. Which returns a different field name not in list each time I try to run it.
0

OMG! When you spend all day staring at this and finally realize it's an indention issue. FML. Thanks guys!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.