I am reading several .csv files (each file is a time-series with the date in column one (which I would like to index by), and the time series in column two. I can read in the data but it's all appended to the same column in the dataframe when I would like each file tohave its own column indexed by date:
So for example if I have 3 files (I have more than three in reality)
csv1
1/1/2016,1.1
2/1/2016,1.2
3/1/2016,1.6
csv2
1/1/2016,4.6
2/1/2016,31.2
3/1/2016,1.8
csv3
2/1/2016,3.2
3/1/2016,5.8
Currently I return:
0 1
1/1/2016 1.1
2/1/2016 1.2
3/1/2016 1.6
1/1/2016 4.6
2/1/2016 31.2
3/1/2016 1.8
2/1/2016 3.2
3/1/2016 5.8
When I would like to return:
0 1 2 3
1/1/2016 1.1 4.6 null
2/1/2016 1.2 31.2 3.2
3/1/2016 1.6 1.8 5.8
My code at the moment looks like this:
def getData(rawDataPath):
big_frame = pd.DataFrame()
path = rawDataPath
allfiles = glob.glob(os.path.join(path,"*.csv"))
np_array_list = []
for file_ in allfiles:
df = pd.read_csv(file_,index_col=None, header=0)
np_array_list.append(df.as_matrix())
comb_np_array = np.vstack(np_array_list)
big_frame = big_frame.append(pd.DataFrame(comb_np_array))
return big_frame