3

I am trying to import a set of *.txt files. I need to import the files into successive columns of a Pandas DataFrame in Python.

Requirements and Background information:

  1. Each file has one column of numbers
  2. No headers are present in the files
  3. Positive and negative integers are possible
  4. The size of all the *.txt files is the same
  5. The columns of the DataFrame must have the name of file (without extension) as the header
  6. The number of files is not known ahead of time

Here is one sample *.txt file. All the others have the same format.

16
54
-314
1
15
4
153
86
4
64
373
3
434
31
93
53
873
43
11
533
46

Here is my attempt:

import pandas as pd
import os
import glob

# Step 1: get a list of all csv files in target directory
my_dir = "C:\\Python27\Files\\"
filelist = []
filesList = []
os.chdir( my_dir )

# Step 2: Build up list of files:
for files in glob.glob("*.txt"):
    fileName, fileExtension = os.path.splitext(files)
    filelist.append(fileName) #filename without extension
    filesList.append(files) #filename with extension

# Step 3: Build up DataFrame:
df = pd.DataFrame()
for ijk in filelist:
    frame = pd.read_csv(filesList[ijk])
    df = df.append(frame)
print df

Steps 1 and 2 work. I am having problems with step 3. I get the following error message:

Traceback (most recent call last):
  File "C:\Python27\TextFile.py", line 26, in <module>
    frame = pd.read_csv(filesList[ijk])
TypeError: list indices must be integers, not str

Question: Is there a better way to load these *.txt files into a Pandas dataframe? Why does read_csv not accept strings for file names?

1
  • 1
    instead of this frame = pd.read_csv(filesList[ijk]) use this frame = pd.read_csv(ijk) in your for loop Commented Apr 3, 2019 at 14:08

2 Answers 2

8

You can read them into multiple dataframes and concat them together afterwards. Suppose you have two of those files, containing the data shown.

In [6]:
filelist = ['val1.txt', 'val2.txt']
print pd.concat([pd.read_csv(item, names=[item[:-4]]) for item in filelist], axis=1)
    val1  val2
0     16    16
1     54    54
2   -314  -314
3      1     1
4     15    15
5      4     4
6    153   153
7     86    86
8      4     4
9     64    64
10   373   373
11     3     3
12   434   434
13    31    31
14    93    93
15    53    53
16   873   873
17    43    43
18    11    11
19   533   533
20    46    46
Sign up to request clarification or add additional context in comments.

5 Comments

Sorry I forgot to mention: there are many files maybe >20. I would strongly prefer to avoid reading them in manually. Also, I do not understand this part: "names=[item[:-4]". What is the significance of -4?
You can use os.listdir(PATH) to get a list of all files in the PATH, so that part is easy. As for, names=item[:-4]: the files end with '.txt', and you don't want '.txt' to be a part of your column name, right?
Thanks. I tried this approach: Line 1 - df = pd.DataFrame() Line 2 - for item in filesList: Line 3 - df = pd.concat(pd.read_csv(item, names=[item[:-4]]), axis = 1). But it is giving an error message: "TypeError: first argument must be a list-like of pandas objects, you passed an object of type "DataFrame". Is there some reason why this approach does not work?
CT Zhu's code is working but I do not understand why my approach in the comment above is not working. his method used list comprehension. I just used a simple for loop. Could you please let me know why my approach will not work?
Thank you! Note that for my case, I wanted to stack these dataframes by concatenating rows (instead of columns), so I replaced axis=1 with axis=0, ignore_index=True
3

You're very close. ijk is the filename already, you don't need to access the list:

# Step 3: Build up DataFrame:
df = pd.DataFrame()
for ijk in filelist:
    frame = pd.read_csv(ijk)
    df = df.append(frame)
print df

In the future, please provide working code exactly as is. You import from pandas import * yet then refer to pandas as pd, implying the import import pandas as pd.

You also want to be careful with variable names. files is actually a single file path, and filelist and filesList have no discernible difference from the variable name. It also seems like a bad idea to keep personal documents in your python directory.

1 Comment

Sorry about the confusion with the Pandas command - yes, that should be corrected. I have updated the Original post.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.