Running code through a folder of files using Pandas

Question

So I've managed to write a code in pandas to do the data analysis I need and export to a new .xlsx file. Which is awesome, except it's for one file and I typically have 40+ files I want to run this on.

Through research I managed to at least get it to read the file names in the folder, but I am at a loss on how to implement into my existing code.

Goal: To run code over each .xlsx file in folder and spit out the analyzed data as new .xlsx files.

For now here is the code I came up with to read the folder:

import os
import glob

os.chdir('C:/Users/PCTR261010/Desktop/PartReviewExport')
FileList = glob.glob('*.xlsx')
print(FileList)

Here is a snippet of the import section of my larger code file:

import os
import glob
import pandas as pd

# Prints header information in Part Scorecard
df = pd.read_excel('GAT_US_PartReview_2017-06-23.xlsx', header=None, 
skipinitialspace=True, skiprows=1)
header = df.head(5).filter([0,2], axis=1)

# Begins Data Analysis of Part Scorecard
fields = ['Appl Req', 'Appl Count ', 'Intr Req', 'Intr Count ', 'OE Intr 
Req', 'Has OE Intr', 'Has Attr Editor',
      'Part IMG Req', 'Has Part IMG', 'Has MPCC', 'Warr Req', 'Has Warr 
TXT', 'Has Warr PDF', 'MSDS Req',
      'Has MSDS', 'UPC Req', 'Has UPC', 'Has UNSPSC', 'Valid Part']

df = pd.read_excel('GAT_US_PartReview_2017-06-23.xlsx', 
skipinitialspace=True, skiprows=7, usecols=fields,
               dtype=str)

Any help is appreciated!!

James · Accepted Answer · 2017-07-26 19:29:53Z

1

You can iterate over each of the file names, passing them to pandas

import os
import glob
import pandas as pd

os.chdir('C:/Users/PCTR261010/Desktop/PartReviewExport')
FileList = glob.glob('*.xlsx')
print(FileList)


for fname in FileList:  
    # Prints header information in Part Scorecard
    df = pd.read_excel(fname, header=None, 
    skipinitialspace=True, skiprows=1)
    header = df.head(5).filter([0,2], axis=1)

    # Begins Data Analysis of Part Scorecard
    fields = ['Appl Req', 'Appl Count ', 'Intr Req', 'Intr Count ', 'OE Intr 
    Req', 'Has OE Intr', 'Has Attr Editor',
          'Part IMG Req', 'Has Part IMG', 'Has MPCC', 'Warr Req', 'Has Warr 
    TXT', 'Has Warr PDF', 'MSDS Req',
          'Has MSDS', 'UPC Req', 'Has UPC', 'Has UNSPSC', 'Valid Part']

    df = pd.read_excel(fname, skipinitialspace=True, 
                       skiprows=7, usecols=fields, dtype=str)

    # ... analysis here ...
    df.to_excel('out_' + fname)

edited Jul 26, 2017 at 19:29

answered Jul 26, 2017 at 11:58

James

37k4 gold badges54 silver badges79 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Sam Russo Over a year ago

Does this same concept need to be applied under #Prints header information in Part Scorecard? Because I am calling a specific file name there

James Over a year ago

only if you want to print the header for each file.

Sam Russo Over a year ago

Thanks James, not I just have to figure out why I keep getting 'ValueError: 'Intr Req' is not in list. Which returns a different field name not in list each time I try to run it.

Sam Russo · Accepted Answer · 2017-07-27 19:51:00Z

0

OMG! When you spend all day staring at this and finally realize it's an indention issue. FML. Thanks guys!

answered Jul 27, 2017 at 19:51

Sam Russo

1451 gold badge4 silver badges19 bronze badges

Collectives™ on Stack Overflow

Running code through a folder of files using Pandas

2 Answers 2

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related