2

I'm working on a program that requires loading of a large number of csv files (thousands of them) into an array.

The csv files are of dimension 45x100, and I want to create a 3-d array with dimension nx45x100. For now, I am using pd.read_csv() to load each csv file and then convert each into an array using np.array(). I then create a 3d array using np.array(data_0, data_1,...,data_n), to which I get a 3-d array with the required dimensions.

Although it works, it is very tedious. Is there any way that this can be done without individually reading and processing each csv file?

   #this is my current code
   import numpy as np
   import pandas as pd
   from pandas import Series, DataFrame

   mBGS5L = pd.read_csv("strain5.csv") #45x100 
   mBGS8L = pd.read_csv("strain8.csv")
   mBGS10L = pd.read_csv("strain10.csv")

   mBGS5L_ = np.array(mBGS5L)
   mBGS8L_ = np.array(mBGS8L)
   mBGS10L_ = np.array(mBGS10L)

   mBGS = np.array([mBGS5L_,mBGS8L_,mBGS10L_])
   #to which mBGS.shape returns a 3x45x100 array'''

Note: I have checked other stackoverflow links on loading multiple csv files into 1 dataframe, to which I learned about glob to get the list of all csv files I need. My problem though is that using glob and concatenating the csv files returns a list and not a 3d array---which I can't convert to numpy array as it returns an error

   from glob import glob
   strain = glob("strain*.csv")
   df= [pd.read_csv(f) for f in strain]
   df_ = np.asarray(df)
   #this returns an error: cannot copy sequence with size 45 to array axis with dimension 30

Any help would be greatly appreciated. Thanks

2
  • Are you sure all files produce the same shape array? Commented Jul 31, 2019 at 5:40
  • yes, they are all 45x100 arrays Commented Aug 2, 2019 at 4:44

1 Answer 1

2

First you need to convert the dataframes in to mxm array. Refer to the code below

from glob import glob
import numpy as np
strain = glob("strain*.csv")
df = [pd.read_csv(f).values for f in strain]
df_ = np.asarray(df)
Sign up to request clarification or add additional context in comments.

1 Comment

Is it possible to use this code if I have multiple sets of data groups ie strain1 = glob("strain1*.csv"), strain2 = glob("strain*.csv")? I want to create a 3d array in which all csv files from these 2 datasets are included. Thank you :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.