1

Please, before marking this question as duplicated read the whole post. I know that this post has a similar question but what I'm looking for is somehow different.

I have a list of file names:

files = ['first.csv', 'second.csv', 'third.csv']

And I want to read them inside a loop with pandas. What I expect is to create for each iteration inside the loop a different dataframe:

first = pd.read_csv('first.csv')
second = pd.read_csv('second.csv')
third = pd.read_csv('third.csv')

But inside a loop. Something like:

for i in range(len(files)):
    csv = re.split('.', files[i])[0]
    csv = pd.read_csv(files[i])

IMPORTANT: Each csv has different rows and columns. So what I want is not to read the three csv to combine them into one with pd.concat. I want to read them separately.

I tried to read them into a list with:

dataframe_list = [pd.read_csv(file_name) for file_name in files]

But that raises the next error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x85 in position 59: invalid start byte
10
  • "Something like" is exactly what you need (except for the second line, which is useless). Did you try? Commented Aug 22, 2018 at 19:08
  • 1
    @DYZ, won't what they have simply result in csv being the dataframe corresponding to third.csv? It sounds like they want three different dataframes. Commented Aug 22, 2018 at 19:10
  • 2
    append in the new_list after creating a new df Commented Aug 22, 2018 at 19:10
  • 2
    More efficiently, you can get a list of dataframes with frames=[pd.read_csv(f) for f in files] or even frames=list(map(pd.read_csv, files)). Commented Aug 22, 2018 at 19:11
  • 1
    @Rubén that error is an issue with reading the csv, not storing them in a list. If all of the files have different encodings, you can either specify the encodings for each file in a dictionary, or more haphazardly, use a try and except clause. except UnicodeDecodeError: and then try reading the bad files with the added argument encoding='latin-1' within pd.read_csv Commented Aug 22, 2018 at 20:04

2 Answers 2

2

You can do something like this:

import pandas as pd
files = ['file1.csv', 'file2.csv', 'file3.csv']
dataframe_list = [pd.read_csv(file_name) for file_name in files]

then you can call dataframe_list[0] to get the first dataframe, and so on. You might want to use a dictionary instead with keys being the dataframe labels you want.


Quick tip: the construct for i in range(0, len(files)) and then only caring about files[i] is ugly. files is a list, so you can iterate over it using for file in files.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for the tip! I tried your solution but it raises this error: UnicodeDecodeError: 'utf-8' codec can't decode byte 0x85 in position 59: invalid start byte
0
files = ['first.csv', 'second.csv', 'third.csv']
list_of_df=[]
for i in range(len(files)):
    df = pd.read_csv(files[i],encoding = "utf-8")
    list_of_df.append(df)

2 Comments

I tried but got this error: UnicodeDecodeError: 'utf-8' codec can't decode byte 0x85 in position 59: invalid start byte
you need to pass encoding parameter.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.