0

I have the same file with quarterly data saved in different folders corresponding to the quarter. In other words, a quarter 1 folder, quarter 2 , quarter 3, quarter 4. This is the only difference in the file path. I am looking to read all four files in and concatenate them into one database. I can do this manually using a version of the simplified code below and changing the period each time.

period = ‘Q1’

filepath = ‘filepath/’ + period

file = filepath + ‘/file.xls’

df = pd.read_excel(file)

I would like to automate it with some form of for loop (I assume). That loops through the 4 periods, reads the file into a database and then concatenates. I have read other answers as to how this can be done with files in the same folder. But am struggling to do it where the file path changes. Manually putting the files into the same folder is not a desirable solution.

I tried making period a tuple and a list containing all 4 periods then a simple for loop but this didn’t work. I got the following error message.

TypeError: Can't convert 'list' object to str implicitly

Greatly appreciate any advice.

3
  • The code you have given us cannot cause this problem. Commented Sep 26, 2019 at 14:38
  • The code I have given is what works. I get the error when I convert period to be a tuple or list such as ('Q1', 'Q2', 'Q3', 'Q4') which I want to loop through. Commented Sep 26, 2019 at 14:43
  • To fix the problem, I need the code that causes the problem, not the code that works. Commented Sep 26, 2019 at 14:44

4 Answers 4

1

How about you first use list comprehension to get a list of all files:

periods= ["Q1", "Q2", "Q3", "Q4"]
files = ["filepath/"+ p + "/file.xls" for p in periods]

and then load them all into a list of data frames with

dfs = []
for f in files:
    df = pd.read_excel(f)
    dfs.append(df)
Sign up to request clarification or add additional context in comments.

Comments

1

You can use these loops to create full file paths and to iterate over them to create one DataFrame containing all the files.

filepath = 'path/'
file = 'file.xlsx'
periods=['Q1','Q2','Q3','Q4']

files = []
for p in periods:
    files.append(filepath+p+'/'+file)
files

data = []
for f in files:
    data.append(pd.read_excel(f))

df = pd.concat(data)

Comments

0

You probably want something like this:

periods = ['Q1', 'Q2', 'Q3', 'Q4']

df = None
for period in periods:
    filepath = 'filepath/' + period
    file = filepath + '/file.xls'
    if df is None:
        df = pd.read_excel(file)
    else:
        df.append(pd.read_excel(file))

Comments

0

You could try something like this:

complete_df = pd.DataFrame()
for i in range(1,5):
  quarter = 'Q'+str(i)
  filepath = 'filepath/' + quarter
  file = filepath + '/file.xls'
  df = pd.read_excel(file)
  complete_df = complete_df.append(df)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.