0

I have several csv files that I am trying to load an concatenate using pandas. There have been similar questions asked but the answers do not appear to be working for me. Basically, the code is loading the csv files and concatenating but the structure of the DF is strange (the number of columns grows unexpectedly). Bit of background: I am a new convert from Matlab and I actually have this code working in Matlab and I just want to get it running in Python. Here is the code:

import pandas as pd
import glob

filelist = glob.glob('/.../*.csv')
DF = pd.DataFrame()
list_ = []
    for i in filelist:
        tmp = pd.read_csv(i, header=1, skiprows=0, index_col=None)
        list_.append(tmp)
        DF = pd.concat(list_)
DF.to_csv('/.../All.csv')

The csv files are structured like this:

TestDate,City,State,ZipCode,County,Num,A,B,C
9/1/16,X,AL,X,X,29,negative,positive,positive
9/1/16,X,AL,X,X,1,negative,negative,negative
9/1/16,X,AL,X,X,10,negative,negative,negative

The output looks like this:

,11/14/16,11/7/16,17,29,32,X,71901,9/1/16,99771,AK,AL,AR,X,X,X,X,Nome Census Area,X,negative,negative.1,negative.2,positive,positive.1
0,,,,1.0,,X,,9/1/16,,,AL,,X,X,,,,,negative,,,negative,negative
1,,,,10.0,,X,,9/1/16,,,AL,,X,X,,,,,negative,,,negative,negative
2,,,,11.0,,X,,9/1/16,,,AL,,X,X,,,,,negative,,,negative,negative
1
  • Can you fix your indentation? As is this should raise a syntax issue due to for line's unexpected indent? Commented Feb 24, 2017 at 18:03

1 Answer 1

1

The issue is header=1 which tells pandas that the second row should be treated as header rather than the first.

from io import StringIO
import pandas as pd
data="""TestDate,City,State,ZipCode,County,Num,A,B,C
9/1/16,X,AL,X,X,29,negative,positive,positive
9/1/16,X,AL,X,X,1,negative,negative,negative
9/1/16,X,AL,X,X,10,negative,negative,negative"""
df=pd.read_csv(StringIO(data))
print(df)
  TestDate City State ZipCode County  Num         A         B         C
0   9/1/16    X    AL       X      X   29  negative  positive  positive
1   9/1/16    X    AL       X      X    1  negative  negative  negative
2   9/1/16    X    AL       X      X   10  negative  negative  negative
df=pd.read_csv(StringIO(data),header=1,skiprows=0)
print(df)
9/1/16  X  AL X.1 X.2  29  negative  positive positive.1
0  9/1/16  X  AL   X   X   1  negative  negative   negative
1  9/1/16  X  AL   X   X  10  negative  negative   negative

The problem is that you have many different cities in the first row in the respective dataframes, so more and more columns are inserted into the DataFrame, which means you have too little columns in the data for each respective file and everything gets super messy.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks Sebastian! That fixed it. I spent hours on this -- so embarrassed.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.