Trying to read multiple .csv into separate data-frame columns

Question

I am reading several .csv files (each file is a time-series with the date in column one (which I would like to index by), and the time series in column two. I can read in the data but it's all appended to the same column in the dataframe when I would like each file tohave its own column indexed by date:

So for example if I have 3 files (I have more than three in reality)

csv1
1/1/2016,1.1
2/1/2016,1.2
3/1/2016,1.6

csv2
1/1/2016,4.6
2/1/2016,31.2
3/1/2016,1.8

csv3
2/1/2016,3.2
3/1/2016,5.8

Currently I return:

0        1 
1/1/2016 1.1
2/1/2016 1.2
3/1/2016 1.6
1/1/2016 4.6
2/1/2016 31.2
3/1/2016 1.8
2/1/2016 3.2
3/1/2016 5.8

When I would like to return:

0        1   2   3
1/1/2016 1.1 4.6 null
2/1/2016 1.2 31.2 3.2
3/1/2016 1.6 1.8 5.8

My code at the moment looks like this:

def getData(rawDataPath): 
    big_frame = pd.DataFrame()
    path = rawDataPath
    allfiles = glob.glob(os.path.join(path,"*.csv"))


    np_array_list = []
    for file_ in allfiles:
        df = pd.read_csv(file_,index_col=None, header=0)
        np_array_list.append(df.as_matrix())

    comb_np_array = np.vstack(np_array_list)

    big_frame = big_frame.append(pd.DataFrame(comb_np_array))

    return big_frame

Ilja Everilä · Accepted Answer · 2016-04-09 16:06:45Z

3

Since you already use DataFrame from pandas, might as well use pandas' join/merging functionality:

In [21]: csv1 = io.StringIO("""1/1/2016,1.1
2/1/2016,1.2
3/1/2016,1.6""")

In [22]: csv2 = io.StringIO("""1/1/2016,4.6
2/1/2016,31.2
3/1/2016,1.8""")

In [23]: csv3 = io.StringIO("""2/1/2016,3.2
3/1/2016,5.8""")

In [24]: df1 = pd.read_csv(csv1, header=None)

In [25]: df2 = pd.read_csv(csv2, header=None)

In [26]: df3 = pd.read_csv(csv3, header=None)

In [27]: pd.merge(pd.merge(df1, df2, on=0, how='outer'), df3, on=0, how='outer')
Out[27]: 
          0  1_x   1_y    1
0  1/1/2016  1.1   4.6  NaN
1  2/1/2016  1.2  31.2  3.2
2  3/1/2016  1.6   1.8  5.8

The example uses how='outer', which means a full outer join. That was chosen in case your data can have missing keys from file to file. If this is not the case, consider other strategies as suit you best.

In order to reduce all your files in a sane fashion you can for example do:

In [30]: from functools import partial, reduce

In [31]: reduce(partial(pd.merge, on=0, how='outer'), [df1, df2, df3])
Out[31]: 
          0  1_x   1_y    1
0  1/1/2016  1.1   4.6  NaN
1  2/1/2016  1.2  31.2  3.2
2  3/1/2016  1.6   1.8  5.8

Just replace the list with your own preloaded dataframes:

def getData(rawDataPath):
    path = rawDataPath
    allfiles = glob.glob(os.path.join(path, "*.csv"))
    dataframes = (pd.read_csv(fname, header=None, names=['date', fname])
                  for fname in allfiles)
    return reduce(partial(pd.merge, on='date', how='outer'), dataframes)

edited Apr 9, 2016 at 16:06

answered Apr 9, 2016 at 14:55

Ilja Everilä

53.4k9 gold badges138 silver badges142 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Stacey Over a year ago

thanks that's great! Is there a way to add the .csv file names as a column headers?

Ilja Everilä Over a year ago

Hmm I think you can modify the column names afterwards at least by assigning to dframe.columns = ['date', 'csv1', 'csv2', 'csv3'] or so, or name your columns when creating the frames: pd.read_csv(csv1, names=['date', 'csv1'], header=None). That way there's no need to suffix common columns and the merged result will be fine as is.

redacted Over a year ago

Alternative (imo prettier) syntax to pd.merge(df1,df2,...) is df1.merge(df2, on=0, how='outer').merge(df3, on=0, how='outer') and wow the reduce(partial(... is pretty elegant! :)

Ilja Everilä Over a year ago

@Pocin oh snap, should've known dataframes have merge as a method.

Stacey Over a year ago

Cheers fellas that hit the spot

Collectives™ on Stack Overflow

Trying to read multiple .csv into separate data-frame columns

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related