Using Pandas in Python to Join Multiple Files Based on Date

Question

I have csv files that I need to join together based upon date but the dates in each file are not the same (i.e. some files start on 1/1/1991 and other in 1998). I have a basic start to the code (see below) but I am not sure where to go from here. Any tips are appreciated. Below please find a sample of the different csv I am trying to join.

import os, pandas as pd, glob
directory = r'C:\data\Monthly_Data'
files = os.listdir(directory)
print(files)
all_data =pd.DataFrame()
for f in glob.glob(directory):
    df=pd.read_csv(f)
    all_data=all_data.append(df,ignore_index=True)
all_data.describe()

File 1

DateTime    F1_cfs  F2_cfs  F3_cfs  F4_cfs  F5_cfs  F6_cfs  F7_cfs
3/31/1991   0.860702028 1.167239264 0   0   0   0   0
4/30/1991   2.116930556 2.463493056 3.316688418             
5/31/1991   4.056572581 4.544307796 5.562668011             
6/30/1991   1.587513889 2.348215278 2.611659722             
7/31/1991   0.55328629  1.089637097 1.132043011             
8/31/1991   0.29702957  0.54186828  0.585073925             2.624375
9/30/1991   0.237083333 0.323902778 0.362583333 0.925563094 1.157786606 2.68722973  2.104090278

File 2

DateTime    F1_mg-P_L   F2_mg-P_L   F3_mg-P_L   F4_mg-P_L   F5_mg-P_L   F6_mg-P_L   F7_mg-P_L
6/1/1992    0.05    0.05    0.06    0.04    0.03    0.18    0.08
7/1/1992    0.03    0.05    0.04    0.03    0.04    0.05    0.09
8/1/1992    0.02    0.03    0.02    0.02    0.02    0.02    0.02

File 3

DateTime    F1_TSS_mgL  F1_TVS_mgL  F2_TSS_mgL  F2_TVS_mgL  F3_TSS_mgL  F3_TVS_mgL  F4_TSS_mgL  F4_TVS_mgL  F5_TSS_mgL  F5_TVS_mgL  F6_TSS_mgL  F6_TVS_mgL  F7_TSS_mgL  F7_TVS_mgL
4/30/1991   10  7.285714286 8.5 6.083333333 3.7 3.1                             
5/31/1991   5.042553191 3.723404255 6.8 6.3 3.769230769 2.980769231                             
6/30/1991           5   5   1   1                               
7/31/1991                                                       
8/31/1991                                                       
9/30/1991   5.75    3.75    6.75    4.75    9.666666667 6.333333333 8.666666667 5   12  7.666666667 8   5.5 9   6.75
10/31/1991  14.33333333 9   14  10.66666667 16.25   11  12.75   9.25    10.25   7.25    29.33333333 18.33333333 13.66666667 9
11/30/1991  2.2 1.933333333 2   1.88    0   0   4.208333333 3.708333333 10.15151515 7.909090909 9.5 6.785714286 4.612903226 3.580645161

I am getting the following error: IOError: Initializing from file failed — Koda
– Koda, Commented Mar 23, 2016 at 20:41
if you want to append your data vertically (append rows) then you either have to normalize your column names or to skip header row in read_csv(header=None). If you want to append your data horizontally (add columns) you have to specify axis=1 in append() — MaxU - stand with Ukraine
– MaxU - stand with Ukraine, Commented Mar 23, 2016 at 22:20
Can you show me what it would look like to have axis=1 in append()? — Koda
– Koda, Commented Mar 23, 2016 at 22:25

Haifeng Zhang · Accepted Answer · 2016-03-23 22:31:01Z

1

You didn't read the csv files correctly.

1) You need to comment out the following lines because you never use it later in your code.

   files = os.listdir(directory)
   print(files)

2) glob.glob(directory) didnt return any match files. glob.glob() takes pattern as argument, for example: 'C:\data\Monthly_Data\File*.csv', unfortunately you put a directory as a pattern, and no files are found

for f in glob.glob(directory):

I modified the above 2 parts and print all_data, the file contents display on my console

edited Mar 23, 2016 at 22:31

answered Mar 23, 2016 at 22:25

Haifeng Zhang

32.1k22 gold badges95 silver badges133 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Using Pandas in Python to Join Multiple Files Based on Date

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related