Following this Q&A, i have managed to concatenate several CSV files into one time-series dataframe, appending a column to add the name of CSV file from which each record came, like so:
import os
import glob
import pandas as pd
path = ''
all_files = glob.glob(os.path.join(path, "*.csv"))
names = [os.path.basename(x) for x in glob.glob(path+'\*.csv')]
df = pd.DataFrame()
for file_ in all_files:
file_df = pd.read_csv(file_, sep=',', parse_dates=["capture_datetime_utc"], index_col="capture_datetime_utc")
file_df['file_name'] = file_
df = df.append(file_df)
df.shape
This seems to work fine, and- as you can see in this Jupyter Notebook -i get a dataframe whose shape has 5 columns.
But then when i downsample this time series df from 15 minute intervals to an hourly mean, like so:
df_h = df.resample('H').mean()
df_h.shape
I get a dataframe whose shape has only 4 columns.
So it seems like this append function i have performed lacks persistence, and i need to make it persist. I have tried inserting the "inplace=True" arg into the append function itself (threw an error) and also after it (made no difference).
If anyone can show me the way to make this appended column permanent, i'd be much obliged!
file_namecolumn is being removed because it does not have a numerical dtype. See here: stackoverflow.com/a/34270422/8146556