1

I have some code that looks like this:

import pandas as pd
import glob

root_directory = r"\\some\shared\directory"

all_files = glob.glob(f'{root_directory}\CPE_*.csv')

li = []

for filename in all_files:
    frame = pd.read_csv(filename, index_col=None, header=0, encoding='latin1')
    li.append(frame)

df = pd.concat(li, axis=0, ignore_index=True)

This code allows me to concatenate the data and create a master csv file, but I want to add a new column to each dataframe as I loop through them. The file names look something like this: CPE_02082020.csv , CPE_02092020 , etc. So the date is in the file name itself with the format of mmddyyyy. How do I add a date column to each file as I loop through and concatenate them?

2
  • each new file will add just a single column? Do the files have equal number of rows? Commented Apr 1, 2021 at 14:20
  • No all files do not have equal number of rows. Just looking to add 1 "date" column to this master dataframe. Commented Apr 1, 2021 at 14:22

2 Answers 2

3

To add a date column to each dataframe you can simply add a new column to frame while looping through all_files. This method was suggested by biobirdman on a previous post. You can then use split() to get just the date from the filename.

Try:

for filename in all_files:
    frame = pd.read_csv(filename, index_col=None, header=0, encoding='latin1')
    frame['date'] = filename.split('_')[1]
    li.append(frame)

df = pd.concat(li, axis=0, ignore_index=True)

Then to convert df['date'] to datetime add:

df['date'] = pd.to_datetime(df['date'])
Sign up to request clarification or add additional context in comments.

Comments

0

You can assign the column during a for loop, let's simplify a little using a list comprehension and Pathlib:

import pandas as pd 
from pathlib import Path 

root_dir = Path('\\some\shared\directory')

all_files = root_dir.glob('*CPE_*.csv')

dfs = [
    pd.read_csv(f, index_col=None, header=0, encoding='latin1')\
            .assign(date_col=
                pd.to_datetime(f.stem.split('_')[1],format='%m%d%Y',errors='coerce'))
    for f in all_files
]


print(pd.concat(dfs))

enter image description here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.