Read multiple csv files (size mxm) and load as an n dimensional array (size nxmxm) (not concatenate)

Question

I'm working on a program that requires loading of a large number of csv files (thousands of them) into an array.

The csv files are of dimension 45x100, and I want to create a 3-d array with dimension nx45x100. For now, I am using pd.read_csv() to load each csv file and then convert each into an array using np.array(). I then create a 3d array using np.array(data_0, data_1,...,data_n), to which I get a 3-d array with the required dimensions.

Although it works, it is very tedious. Is there any way that this can be done without individually reading and processing each csv file?

   #this is my current code
   import numpy as np
   import pandas as pd
   from pandas import Series, DataFrame

   mBGS5L = pd.read_csv("strain5.csv") #45x100 
   mBGS8L = pd.read_csv("strain8.csv")
   mBGS10L = pd.read_csv("strain10.csv")

   mBGS5L_ = np.array(mBGS5L)
   mBGS8L_ = np.array(mBGS8L)
   mBGS10L_ = np.array(mBGS10L)

   mBGS = np.array([mBGS5L_,mBGS8L_,mBGS10L_])
   #to which mBGS.shape returns a 3x45x100 array'''

Note: I have checked other stackoverflow links on loading multiple csv files into 1 dataframe, to which I learned about glob to get the list of all csv files I need. My problem though is that using glob and concatenating the csv files returns a list and not a 3d array---which I can't convert to numpy array as it returns an error

   from glob import glob
   strain = glob("strain*.csv")
   df= [pd.read_csv(f) for f in strain]
   df_ = np.asarray(df)
   #this returns an error: cannot copy sequence with size 45 to array axis with dimension 30

Any help would be greatly appreciated. Thanks

Are you sure all files produce the same shape array?

hpaulj
– hpaulj

2019-07-31 05:40:04 +00:00
Commented Jul 31, 2019 at 5:40 — hpaulj
– hpaulj, Commented Jul 31, 2019 at 5:40
yes, they are all 45x100 arrays

ella
– ella

2019-08-02 04:44:41 +00:00
Commented Aug 2, 2019 at 4:44 — ella
– ella, Commented Aug 2, 2019 at 4:44

Deepak Chauhan · Accepted Answer · 2019-07-31 05:19:52Z

2

First you need to convert the dataframes in to mxm array. Refer to the code below

from glob import glob
import numpy as np
strain = glob("strain*.csv")
df = [pd.read_csv(f).values for f in strain]
df_ = np.asarray(df)

answered Jul 31, 2019 at 5:19

Deepak Chauhan

9428 silver badges12 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

ella Over a year ago

Is it possible to use this code if I have multiple sets of data groups ie strain1 = glob("strain1*.csv"), strain2 = glob("strain*.csv")? I want to create a 3d array in which all csv files from these 2 datasets are included. Thank you :)

Collectives™ on Stack Overflow

Read multiple csv files (size mxm) and load as an n dimensional array (size nxmxm) (not concatenate)

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related