0

In Python 3 and pandas I loaded several TXT files. They have no header and have the same structure - 46 columns and the same information theme in each column Example of three cases

candidatos1 = pd.read_csv("candidatos_2014/consulta_cand_2014_AC.txt",sep=';', header=None, encoding = 'latin_1') 

candidatos1.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 621 entries, 0 to 620
Data columns (total 46 columns):
0     621 non-null object
1     621 non-null object
2     621 non-null int64
3     621 non-null int64
4     621 non-null object
5     621 non-null object
6     621 non-null object
7     621 non-null object
8     621 non-null int64
9     621 non-null object
10    621 non-null object
11    621 non-null int64
12    621 non-null int64
13    621 non-null int64
14    621 non-null object
15    621 non-null int64
16    621 non-null object
17    621 non-null int64
18    621 non-null object
19    621 non-null object
20    621 non-null int64
21    621 non-null object
22    621 non-null object
23    621 non-null object
24    621 non-null int64
25    621 non-null object
26    621 non-null object
27    621 non-null int64
28    621 non-null int64
29    621 non-null int64
30    621 non-null object
31    621 non-null int64
32    621 non-null object
33    621 non-null int64
34    621 non-null object
35    621 non-null int64
36    621 non-null object
37    621 non-null int64
38    621 non-null object
39    621 non-null object
40    621 non-null int64
41    621 non-null object
42    621 non-null int64
43    621 non-null int64
44    621 non-null object
45    621 non-null object
dtypes: int64(20), object(26)
memory usage: 223.2+ KB

candidatos2 = pd.read_csv("candidatos_2014/consulta_cand_2014_AL.txt",sep=';', header=None, encoding = 'latin_1') 
candidatos2.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 479 entries, 0 to 478
Data columns (total 46 columns):
0     479 non-null object
1     479 non-null object
2     479 non-null int64
3     479 non-null int64
4     479 non-null object
5     479 non-null object
6     479 non-null object
7     479 non-null object
8     479 non-null int64
9     479 non-null object
10    479 non-null object
11    479 non-null int64
12    479 non-null int64
13    479 non-null int64
14    479 non-null object
15    479 non-null int64
16    479 non-null object
17    479 non-null int64
18    479 non-null object
19    479 non-null object
20    479 non-null int64
21    479 non-null object
22    479 non-null object
23    479 non-null object
24    479 non-null int64
25    479 non-null object
26    479 non-null object
27    479 non-null int64
28    479 non-null int64
29    479 non-null int64
30    479 non-null object
31    479 non-null int64
32    479 non-null object
33    479 non-null int64
34    479 non-null object
35    479 non-null int64
36    479 non-null object
37    479 non-null int64
38    479 non-null object
39    479 non-null object
40    479 non-null int64
41    479 non-null object
42    479 non-null int64
43    479 non-null int64
44    479 non-null object
45    479 non-null object
dtypes: int64(20), object(26)
memory usage: 172.2+ KB

candidatos3 = pd.read_csv("candidatos_2014/consulta_cand_2014_AM.txt",sep=';', header=None, encoding = 'latin_1') 
candidatos3.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 786 entries, 0 to 785
Data columns (total 46 columns):
0     786 non-null object
1     786 non-null object
2     786 non-null int64
3     786 non-null int64
4     786 non-null object
5     786 non-null object
6     786 non-null object
7     786 non-null object
8     786 non-null int64
9     786 non-null object
10    786 non-null object
11    786 non-null int64
12    786 non-null int64
13    786 non-null int64
14    786 non-null object
15    786 non-null int64
16    786 non-null object
17    786 non-null int64
18    786 non-null object
19    786 non-null object
20    786 non-null int64
21    786 non-null object
22    786 non-null object
23    786 non-null object
24    786 non-null int64
25    786 non-null object
26    786 non-null object
27    786 non-null int64
28    786 non-null int64
29    786 non-null int64
30    786 non-null object
31    786 non-null int64
32    786 non-null object
33    786 non-null int64
34    786 non-null object
35    786 non-null int64
36    786 non-null object
37    786 non-null int64
38    786 non-null object
39    786 non-null object
40    786 non-null int64
41    786 non-null object
42    786 non-null int64
43    786 non-null int64
44    786 non-null object
45    786 non-null object
dtypes: int64(20), object(26)
memory usage: 282.5+ KB

Please, is there a way to load these files all at once in a single dataframe?

Or do I need to load one at a time and then gather all the dataframes? How?

1
  • 1
    There are two, efficient ways to do it. Iterate over file reads and append data to a list, and subsequently convert to a dataframe or iterate over file reads and, line by line, append to a dataframe. Commented Feb 21, 2018 at 2:26

2 Answers 2

5

In these situation I like to feed pandas.concat a list comprehension.

from pathlib import Path
import pandas

def _reader(fname):
    return pandas.read_csv(fname, sep=';', header=None, encoding='latin_1')

folder = Path("candidatos_2014")
df = pandas.concat([
    _reader(txt)
    for txt in folder.glob("*.txt")
])
Sign up to request clarification or add additional context in comments.

5 Comments

I hate to even say it... but pathlib2 for those still using Python 2. That out of the way. Nice answer and use of Path (-:
Please, the thirteenth column is being loaded as int64. But it has codes that start with zero on the left. I wrote this command in read_csv, but still importing as int64
pd.read_csv(fname, sep=';', header=None, encoding='latin_1', converters={13: lambda x: str(x)})
Please, is there any other way to keep as an object?
@ReinaldoChaves I searched stackoverflow for "pandas leading zeros" and found this: stackoverflow.com/questions/23836277/… (first hit)
1

You can append dataframes after they are created like so:

    candidatos1.append(candidatos2,ignore_index=True).append(candidatos3,ignore_index=True)

You could concatenate the text files first and then load into Pandas but that's outside of Pandas.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.