Loop current python code for entire directory

Question

What I am looking to do is take the current code I have built and be able to use it or something similar so I can run it on an entire directory of CSV files instead of doing one file at a time. Problem being is I have 50 files and it would be much simpler if I can just point it at a directory and have it run on every file inside the directory.

Thanks in advance

import pandas as pd 
df=pd.read_csv(r"C:\Users\Kris\OneDrive - kris\SW\12-21.csv",)
df=df.rename(columns = {'Segmentation/Pool Code':'Code'})

df_Auto = df.loc[df['Code'].isin(['21', '94', '103', '105', '22', '82', '97', '104', '1', '71', '100', '2', '35', '62', '72', '101'])]

df_Mortgage = df.loc[df['Code'].isin(["M000","M001", "M003", "M004", "M005", "M006", "M007", "M008","M010", "M011", "M013", "M014", "M015", "M016", "M024", "M025", "M027", "M028", "M029", "M031", "M033", "M035","M036","M037","M038","M039",'M040','M041','M042','M043','M044','M020','M021','M022','M023','M026','M032','M034', '18', '28', '34', '87'])]

df_HELOC = df.loc[df['Code'].isin(["17","83","88","19","31","84","85",])]

df_CC = df.loc[df['Code'].isin(["116","118","119","120","121","122","123","125",])]

df_Other = df.loc[df['Code'].isin(["33","41","51","52", "56","57","58","59","75","76","130","131","132","133","134","135","136","140","54", "55","60","77", "78","79","115","4","5","6","7","13","14","16", "32","44","45","46","47","67","106","107","109","110","160","3","10","11","12","25","69","95","102",])]

#Save Files

df_Auto.to_csv(r"C:\Users\Kris\OneDrive - kris\SW\12-21_auto.csv")
df_Mortgage.to_csv(r"C:\Users\Kris\OneDrive - kris\SW\12-21_mortgage.csv")
df_HELOC.to_csv(r"C:\Users\Kris\OneDrive - kris\SW\12-21_HELOC.csv")
df_CC.to_csv(r"C:\Users\Kris\OneDrive - kris\SW\12-21_CC.csv")
df_Other.to_csv(r"C:\Users\Kris\OneDrive - kris\SW\12-21_Other.csv")

I was reading about what might work best and found ``` listOfFiles = os.listdir('.') ``` but I do not know how to set it up with a loop — Kristopher K
– Kristopher K, Commented Mar 11, 2022 at 19:37

Brian · Accepted Answer · 2022-03-11 20:27:44Z

1

Using glob and a function to loop over:

import pandas as pd
import os
from glob import glob

# Using a dict to make things a bit more generic
code_dict = {
    'Auto': ['21', '94', '103', '105', '22', '82', '97', '104', '1', '71', '100', '2', '35', '62', '72', '101'],
    'Mortgage': ["M000","M001", "M003", "M004", "M005", "M006", "M007", "M008","M010", "M011", "M013", "M014", "M015", "M016", "M024", "M025", "M027", "M028", "M029", "M031", "M033", "M035","M036","M037","M038","M039",'M040','M041','M042','M043','M044','M020','M021','M022','M023','M026','M032','M034', '18', '28', '34', '87'],
    'HELOC': ["17","83","88","19","31","84","85",],
    'CC': ["116","118","119","120","121","122","123","125",],
    'Other': ["33","41","51","52", "56","57","58","59","75","76","130","131","132","133","134","135","136","140","54", "55","60","77", "78","79","115","4","5","6","7","13","14","16", "32","44","45","46","47","67","106","107","109","110","160","3","10","11","12","25","69","95","102",]
}

def process_csv(file):
    # Splitting up the file name in to parts for use later
    base_name = os.path.splitext(os.path.basename(file))[0]
    save_dir = os.path.split(file)[0]

    print(f"Reading {base_name}")

    df = pd.read_csv(file)
    df = df.rename(columns={"Segmentation/Pool Code": "Code"})

    # Looping over each of the items in the dictionary
    for name, codes in code_dict.items():
        sub_df = df.loc[df["Code"].isin(codes)]
        # Constructing the save file from the dictionary key
        save_file = os.path.join(save_dir, f"{base_name}_{name.lower()}.csv")
        sub_df.to_csv(save_file)

# Search for files in this directory:
search_dir = r"C:\Users\Kris\OneDrive - kris\SW"

# glob is a nice way to search in a directory
files = glob(os.path.join(search_dir, "*.csv"))
for file in files:
    # Process each file one at a time
    process_csv(file)

answered Mar 11, 2022 at 20:27

Brian

3783 silver badges8 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Kristopher K Over a year ago

Hello, thank you for the help! I am having a problem though, I am getting a dtype error which I can use low_memory = False for, where would I put that? Secondly, after it runs, it generates all the correct files, but only the mortgage file has data in it. I tried changing the auto dict to integers, instead of a string and that didnt seem to work either. Any help would be appreciated.

Brian Over a year ago

@KristopherK Not sure about the dtype error; what's the printout when the error hits? For the empty files, I would to the pd.read_csv(file); df.rename... and see whats in the df["Code"] column (or look in the CSV directly) and make sure the codes are in there and what format the are populated in to the dataframe.

Kristopher K Over a year ago

I think I figured it out. I did a df.head() and for some reason on some of the files it is automatically adding leading zeros in the code column. When I open the CSV in excel they are not there. For example in the terminal it is showing "0021" "0001" but in excel when opening the CSV it just shows "21" "1". I am not sure why this is happening or how to fix it since only some of the files are adding these hidden leading zeros.

Cameron Ford · Accepted Answer · 2022-03-11 19:45:30Z

1

you should define a function which processes a csv and then loop over all files in your directory. For example:

import os
def process_csv(path_name):
    print(f"Processing csv at {path_name}")

def loop_over_directory(directory_name):
    for file_name in os.listdir(directory_name):
        process_csv(directory_name + "/" + file_name)

answered Mar 11, 2022 at 19:45

Cameron Ford

2021 silver badge6 bronze badges

1 Comment

Kristopher K Over a year ago

Excuse my ignorance, I am brand new to python. How would this fit into what I have? How would I have it save each individual into the 5 separate files?

Collectives™ on Stack Overflow

Loop current python code for entire directory

2 Answers 2

3 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related