I'm working on a program that requires loading of a large number of csv files (thousands of them) into an array.
The csv files are of dimension 45x100, and I want to create a 3-d array with dimension nx45x100. For now, I am using pd.read_csv() to load each csv file and then convert each into an array using np.array(). I then create a 3d array using np.array(data_0, data_1,...,data_n), to which I get a 3-d array with the required dimensions.
Although it works, it is very tedious. Is there any way that this can be done without individually reading and processing each csv file?
#this is my current code
import numpy as np
import pandas as pd
from pandas import Series, DataFrame
mBGS5L = pd.read_csv("strain5.csv") #45x100
mBGS8L = pd.read_csv("strain8.csv")
mBGS10L = pd.read_csv("strain10.csv")
mBGS5L_ = np.array(mBGS5L)
mBGS8L_ = np.array(mBGS8L)
mBGS10L_ = np.array(mBGS10L)
mBGS = np.array([mBGS5L_,mBGS8L_,mBGS10L_])
#to which mBGS.shape returns a 3x45x100 array'''
Note: I have checked other stackoverflow links on loading multiple csv files into 1 dataframe, to which I learned about glob to get the list of all csv files I need. My problem though is that using glob and concatenating the csv files returns a list and not a 3d array---which I can't convert to numpy array as it returns an error
from glob import glob
strain = glob("strain*.csv")
df= [pd.read_csv(f) for f in strain]
df_ = np.asarray(df)
#this returns an error: cannot copy sequence with size 45 to array axis with dimension 30
Any help would be greatly appreciated. Thanks