Generate Pandas DataFrames from CSV file list

Question

To frame the question. I am searching a directory for all csv files. I am saving the path of each csv file along with the delineation into a DataFrame. I know want to iterate over the DataFrame, and read in the specific csv file into a dataframe with a name generated from the original filename. I cannot figure out how to dynamically generate these dataframes. I started coding a few days ago so apologies if the syntax is poor.

# Looks in a given directory and all subsequent subdirectories for the extension ".csv"
# Reads path to all csv files and creates a list

PATH = "Z:\Adam"
EXT = "*.csv"
all_csv_files = [file
                 for path, subdir, files in os.walk(PATH)
                 for file in glob(os.path.join(path, EXT))]
# The list of csv file directories is read into a DataFrame
# Dataframe is then split into columns based on the \\ found in the path

df_csv_path = pd.DataFrame(all_csv_files, columns =['Path'])
df_split_path = df_csv_path['Path'].str.split('\\', n = -1, expand = True)
df_split_path = df_split_path.rename(columns = {0:'Drive',1:'Main',2:'Project',3:'Imaging Folder', 4:'Experimental Group',5:'Experimental Rep',6:'File Name'})
df_csv_info = df_split_path.join(df_csv_path['Path'])

# Generates a Dataframe for each of the csv files found in directory
# Dataframe has a name based on the csv filename
for index in df_csv_info.index:
    filepath = ""
    filename = df_csv_info['File Name'].values[index]
    filepath = str(df_csv_info['Path'].values[index])
    filename = pd.read_csv(filepath)

Rodalm · Accepted Answer · 2021-11-11 17:34:34Z

1

The best way is to create a dictionary whose keys are the filenames and the values are the corresponding DataFrame. Instead of using os.path and glob, the modern approach is to use pathlib from the standard library.

Assuming that you don't actually need the DataFrame containing the filenames and just want the DataFrames for each csv file, you can simply do

from pathlib import Path

PATH = Path("Z:\Adam")
EXT = "*.csv"

# dictionary holding all the files DataFrames with the format {"filename": file_DataFrame}
files_dfs = {}

# recursive search for csv files in PATH folder and subfolders 
for csv_file in PATH.rglob(EXT):
    filename = csv_file.name     # get the filename 
    df = pd.read_csv(csv_file)   # read the csv file as a DataFrame
    files_dfs[filename] = df     # add the DataFrame to the dictionary

Then, to access the DataFrame of a specific file you can do

filename_df = files_dfs["<filename>"]

edited Nov 11, 2021 at 17:34

answered Nov 11, 2021 at 17:24

Rodalm

5,7589 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Adam Over a year ago

Thank you so much for the response. It did work very well. I will have to pay more attention to the modern way of doing these things as that was a very simple solution. Thank you again!

Rodalm Over a year ago

@Adam you're welcome! I'm glad I could teach you something new ;) Don't worry, it took me a long time to get to know pathlib. The simplicity and effectiveness of the solutions is just a matter of practice! It takes time, but it comes naturally, you will see! Happy coding!

Collectives™ on Stack Overflow

Generate Pandas DataFrames from CSV file list

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related