2

I would like to optimize the code below. It works but I would like suggestions if it can be done more concisely and efficiently.

import os
import glob
import pandas as pd
import numpy as np

files = glob.glob(os.path.join('data','*.csv'))

dfs = []

for file in files:

       variable = os.path.basename(file).split("_")[0] #split filename 
       df= pd.read_csv(file)
       df['variable'] = variable #assign variable
       dfs.append(df)

finalDf = pd.concat(dfs, ignore_index = True)

Any ideas ? Thank you in advance

Pandas 0.21.1 and Python 3.6.5

1
  • It looks good to me Commented Jun 8, 2018 at 16:14

1 Answer 1

1

The structure of your code is perfectly fine. Concatenating a list of dataframes is more efficient than repeatedly appending to an existing dataframe.

Set dtype

What you can try and optimize is reading your csv file, i.e. df = pd.read_csv(file). My only suggestion is to specify dtype parameter with a dictionary mapping column names to types. In particular, if you have columns with categorical data, map to 'category' to ensure you optimize memory usage.

List comprehension + assign

You mention more concise code. You can utilize pd.DataFrame.assign to create a new series and set it to your filename. In addition, you can use a list comprehension:

dfs = [pd.read_csv(file).assign(variable=os.path.basename(file).split('_')[0]) \
       for file in glob.glob(os.path.join('data','*.csv'))]

finalDf = pd.concat(dfs, ignore_index=True)

If you choose this method, you may lose readability, so document what you are doing.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you. I wasn't aware assign could be set like that as a chain and wrap all into a list comprehension . Also thanks for the dtype mapping parameter for optimisation.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.