0

I have csv files that I need to join together based upon date but the dates in each file are not the same (i.e. some files start on 1/1/1991 and other in 1998). I have a basic start to the code (see below) but I am not sure where to go from here. Any tips are appreciated. Below please find a sample of the different csv I am trying to join.

import os, pandas as pd, glob
directory = r'C:\data\Monthly_Data'
files = os.listdir(directory)
print(files)
all_data =pd.DataFrame()
for f in glob.glob(directory):
    df=pd.read_csv(f)
    all_data=all_data.append(df,ignore_index=True)
all_data.describe()

File 1

DateTime    F1_cfs  F2_cfs  F3_cfs  F4_cfs  F5_cfs  F6_cfs  F7_cfs
3/31/1991   0.860702028 1.167239264 0   0   0   0   0
4/30/1991   2.116930556 2.463493056 3.316688418             
5/31/1991   4.056572581 4.544307796 5.562668011             
6/30/1991   1.587513889 2.348215278 2.611659722             
7/31/1991   0.55328629  1.089637097 1.132043011             
8/31/1991   0.29702957  0.54186828  0.585073925             2.624375
9/30/1991   0.237083333 0.323902778 0.362583333 0.925563094 1.157786606 2.68722973  2.104090278

File 2

DateTime    F1_mg-P_L   F2_mg-P_L   F3_mg-P_L   F4_mg-P_L   F5_mg-P_L   F6_mg-P_L   F7_mg-P_L
6/1/1992    0.05    0.05    0.06    0.04    0.03    0.18    0.08
7/1/1992    0.03    0.05    0.04    0.03    0.04    0.05    0.09
8/1/1992    0.02    0.03    0.02    0.02    0.02    0.02    0.02

File 3

DateTime    F1_TSS_mgL  F1_TVS_mgL  F2_TSS_mgL  F2_TVS_mgL  F3_TSS_mgL  F3_TVS_mgL  F4_TSS_mgL  F4_TVS_mgL  F5_TSS_mgL  F5_TVS_mgL  F6_TSS_mgL  F6_TVS_mgL  F7_TSS_mgL  F7_TVS_mgL
4/30/1991   10  7.285714286 8.5 6.083333333 3.7 3.1                             
5/31/1991   5.042553191 3.723404255 6.8 6.3 3.769230769 2.980769231                             
6/30/1991           5   5   1   1                               
7/31/1991                                                       
8/31/1991                                                       
9/30/1991   5.75    3.75    6.75    4.75    9.666666667 6.333333333 8.666666667 5   12  7.666666667 8   5.5 9   6.75
10/31/1991  14.33333333 9   14  10.66666667 16.25   11  12.75   9.25    10.25   7.25    29.33333333 18.33333333 13.66666667 9
11/30/1991  2.2 1.933333333 2   1.88    0   0   4.208333333 3.708333333 10.15151515 7.909090909 9.5 6.785714286 4.612903226 3.580645161
3
  • I am getting the following error: IOError: Initializing from file failed Commented Mar 23, 2016 at 20:41
  • if you want to append your data vertically (append rows) then you either have to normalize your column names or to skip header row in read_csv(header=None). If you want to append your data horizontally (add columns) you have to specify axis=1 in append() Commented Mar 23, 2016 at 22:20
  • Can you show me what it would look like to have axis=1 in append()? Commented Mar 23, 2016 at 22:25

1 Answer 1

1

You didn't read the csv files correctly.

1) You need to comment out the following lines because you never use it later in your code.

   files = os.listdir(directory)
   print(files)

2) glob.glob(directory) didnt return any match files. glob.glob() takes pattern as argument, for example: 'C:\data\Monthly_Data\File*.csv', unfortunately you put a directory as a pattern, and no files are found

for f in glob.glob(directory):

I modified the above 2 parts and print all_data, the file contents display on my console

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.