Good Morning. Right now I am working with a csv of numerical data and have converted it into a numpy matrix. The CSV is rather large (10000x5) and is constructed as follows (the acronyms for the column vectors arn't super important I suppose, but I'll include them) : name of subject, Blood Pressure, PDAC, GSIC, TDAP
What I would like to do is take this create a list of numpy matrices such that each matrix contains the values associated with a unique subject name, as a simple example Edit( at suggestion I changed the "subject name column" to "subject id" by creating a mapping from names to id. In this example carl has id 1, and doug has id 2):
Original=np.matrix['1 17 28 32 79; 1 89 72 46 22; 1 91 93 88 90; 2 21 57 73 68; 2 43 32 21 22']
Carl = np.matrix['1 17 28 32 79; 1 89 72 46 22; 1 91 93 88 90']
Doug = ['2 21 57 73 68 ;2 43 32 21 22']
matrixlist = [ Doug, Carl]
For a few matrices this wouldn't be too tough of a problem-but there are a lot of subjects spread out in the parent csv, and not every subject has the same number of entries. I have tried converting all the data into a list and then using list comprehension but I'm running into some issues.
Lastly, I was wondering if there was a way to apply a function to each element in the list of matrices. As another simple example: I wrote a function that computes the correlation matrix of a numpy array using its svd. Is it possible to apply it to every element in the list?
def correlation_matrix(x):
covariance_matrix = np.cov(x, y=None, rowvar=False, bias=False, ddof=None, fweights=None, aweights=None)
correlation_matrix =np.matmul(np.matmul(fractional_matrix_power(np.diag(np.diag(covariance_matrix)),-1/2),covariance_matrix),(fractional_matrix_power(np.diag(np.diag(covariance_matrix)),-1/2)))
return correlation_matrix
thanks in advance!
numpy.array, but what exactly is the structure of that array? And why did you make an array in the first place? (It seems you should be working with plain lists or maybe apandas.DataFramegiven your examples)numpy.matrixbut some plainnp.ndarray. But really, you should't be using that if you have strings/numbers in your data. You should probably just stick to Python lists. Or apandas.DataFrame.