1

To frame the question. I am searching a directory for all csv files. I am saving the path of each csv file along with the delineation into a DataFrame. I know want to iterate over the DataFrame, and read in the specific csv file into a dataframe with a name generated from the original filename. I cannot figure out how to dynamically generate these dataframes. I started coding a few days ago so apologies if the syntax is poor.

# Looks in a given directory and all subsequent subdirectories for the extension ".csv"
# Reads path to all csv files and creates a list

PATH = "Z:\Adam"
EXT = "*.csv"
all_csv_files = [file
                 for path, subdir, files in os.walk(PATH)
                 for file in glob(os.path.join(path, EXT))]
# The list of csv file directories is read into a DataFrame
# Dataframe is then split into columns based on the \\ found in the path

df_csv_path = pd.DataFrame(all_csv_files, columns =['Path'])
df_split_path = df_csv_path['Path'].str.split('\\', n = -1, expand = True)
df_split_path = df_split_path.rename(columns = {0:'Drive',1:'Main',2:'Project',3:'Imaging Folder', 4:'Experimental Group',5:'Experimental Rep',6:'File Name'})
df_csv_info = df_split_path.join(df_csv_path['Path'])

# Generates a Dataframe for each of the csv files found in directory
# Dataframe has a name based on the csv filename
for index in df_csv_info.index:
    filepath = ""
    filename = df_csv_info['File Name'].values[index]
    filepath = str(df_csv_info['Path'].values[index])
    filename = pd.read_csv(filepath)

1 Answer 1

1

The best way is to create a dictionary whose keys are the filenames and the values are the corresponding DataFrame. Instead of using os.path and glob, the modern approach is to use pathlib from the standard library.

Assuming that you don't actually need the DataFrame containing the filenames and just want the DataFrames for each csv file, you can simply do

from pathlib import Path

PATH = Path("Z:\Adam")
EXT = "*.csv"

# dictionary holding all the files DataFrames with the format {"filename": file_DataFrame}
files_dfs = {}

# recursive search for csv files in PATH folder and subfolders 
for csv_file in PATH.rglob(EXT):
    filename = csv_file.name     # get the filename 
    df = pd.read_csv(csv_file)   # read the csv file as a DataFrame
    files_dfs[filename] = df     # add the DataFrame to the dictionary

Then, to access the DataFrame of a specific file you can do

filename_df = files_dfs["<filename>"]
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you so much for the response. It did work very well. I will have to pay more attention to the modern way of doing these things as that was a very simple solution. Thank you again!
@Adam you're welcome! I'm glad I could teach you something new ;) Don't worry, it took me a long time to get to know pathlib. The simplicity and effectiveness of the solutions is just a matter of practice! It takes time, but it comes naturally, you will see! Happy coding!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.