using pandas in python to append csv files into one

Question

I have n files in a directory that I need to combine into one. They have the same amount of columns, for example, the contents of test1.csv are:

test1,test1,test1  
test1,test1,test1  
test1,test1,test1

Similarly, the contents of test2.csv are:

test2,test2,test2  
test2,test2,test2  
test2,test2,test2

I want final.csv to look like this:

test1,test1,test1  
test1,test1,test1  
test1,test1,test1  
test2,test2,test2  
test2,test2,test2  
test2,test2,test2

But instead it comes out like this:

test file 1,test file 1.1,test file 1.2,test file 2,test file 2.1,test file 2.2  
,,,test file 2,test file 2,test file 2  
,,,test file 2,test file 2,test file 2  
test file 1,test file 1,test file 1,,,  
test file 1,test file 1,test file 1,,,

Can someone help me figure out what is going on here? I have pasted my code below:

import csv
import glob
import pandas as pd
import numpy as np 

all_data = pd.DataFrame() #initializes DF which will hold aggregated csv files

for f in glob.glob("*.csv"): #for all csv files in pwd
    df = pd.read_csv(f) #create dataframe for reading current csv
    all_data = all_data.append(df) #appends current csv to final DF

all_data.to_csv("final.csv", index=None)

Why are you using pandas just to create a single csv ?

Padraic Cunningham
– Padraic Cunningham

2015-12-12 18:29:20 +00:00
Commented Dec 12, 2015 at 18:29 — Padraic Cunningham
– Padraic Cunningham, Commented Dec 12, 2015 at 18:29
I'm a noob and I thought this was the best way to do it. :/

Jack Bauer
– Jack Bauer

2015-12-12 21:28:41 +00:00
Commented Dec 12, 2015 at 21:28 — Jack Bauer
– Jack Bauer, Commented Dec 12, 2015 at 21:28

Josh Rumbut · Accepted Answer · 2015-12-12 21:56:33Z

I think there are more problems:

I removed import csv and import numpy as np, because in this demo they are not used (but maybe they in missing, lines so they can be imported)
I created list of all dataframes dfs, where dataframes are appended by dfs.append(df). Then I used function concat for joining this list to final dataframe.
In function read_csv I added parameter header=None, because the main problem was that read_csv reads first row as header.
In function to_csv I added parameter header=None for omitting header.
I added folder test to final destination file, because if use function glob.glob("*.csv") you should read output file as input file.

Solution:

import glob
import pandas as pd

all_data = pd.DataFrame() #initializes DF which will hold aggregated csv files

#list of all df
dfs = []
for f in glob.glob("*.csv"): #for all csv files in pwd
    #add parameters to read_csv
    df = pd.read_csv(f, header=None) #create dataframe for reading current csv
    #print df
    dfs.append(df) #appends current csv to final DF
all_data = pd.concat(dfs, ignore_index=True)
print all_data
#       0      1      2
#0  test1  test1  test1
#1  test1  test1  test1
#2  test1  test1  test1
#3  test2  test2  test2
#4  test2  test2  test2
#5  test2  test2  test2
all_data.to_csv("test/final.csv", index=None, header=None)

Next solution is similar.
I add parameter header=None to read_csv and to_csv and add parameter ignore_index=True to append.

import glob
import pandas as pd

all_data = pd.DataFrame() #initializes DF which will hold aggregated csv files

for f in glob.glob("*.csv"): #for all csv files in pwd
    df = pd.read_csv(f, header=None) #create dataframe for reading current csv
    all_data = all_data.append(df, ignore_index=True) #appends current csv to final DF
print all_data
#       0      1      2
#0  test1  test1  test1
#1  test1  test1  test1
#2  test1  test1  test1
#3  test2  test2  test2
#4  test2  test2  test2
#5  test2  test2  test2

all_data.to_csv("test/final.csv", index=None, header=None)

I think pandas is very good library for data processing. So you can try it. And if you are new in Stackoverflow, you can check this.

Fabio Lamanna · Accepted Answer · 2015-12-12 18:15:10Z

2

You can concat. Let df1 be your first dataframe and df2 the second, you can:

df = pd.concat([df1,df2],ignore_index=True)

The ignore_index is optional, you can set it to True if you don't mind the original indexes of the single dataframes.

answered Dec 12, 2015 at 18:15

Fabio Lamanna

21.7k24 gold badges95 silver badges126 bronze badges

4 Comments

tkunk Over a year ago

This will work, if you pass "axis=0" as a parameter.

Fabio Lamanna Over a year ago

@hahdawg thanks for pointing it out. Actually 0 is the default value for axis in concat.

Fabio Lamanna Over a year ago

@JackBauer you're welcome. Please consider to accept one of the two answers received to help other users.

Jack Bauer Over a year ago

I have limited experience with this stuff so it will take me some time to go through it all but I definitely will.

Padraic Cunningham · Accepted Answer · 2015-12-12 18:36:55Z

1

pandas is not a tool to use when all you want is to create a single csv file, you can simply write each csv to a new file as you go:

import glob

with open("out.csv","w") as out:
    for fle in glob.glob("*.csv"):
        with open(fle) as f:
             out.writelines(f)

Or with the csv lib if you prefer:

import glob
import csv

with open("out.csv", "w") as out:
    wr = csv.writer(out)
    for fle in glob.glob("*.csv"):
        with open(fle) as f:
            wr.writerows(csv.reader(f))

Creating a large dataframe just to eventually write to disk makes no real sense, furthermore if you had a lot of large files it may not even be possible.

answered Dec 12, 2015 at 18:36

Padraic Cunningham

181k30 gold badges264 silver badges327 bronze badges

1 Comment

Padraic Cunningham Over a year ago

No worries, pandas is a great tool if you actually want to do some computation on the data, it is not the tool to use to concat a few files into one

Collectives™ on Stack Overflow

using pandas in python to append csv files into one

3 Answers 3

1 Comment

4 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

4 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related