0

Good Morning. Right now I am working with a csv of numerical data and have converted it into a numpy matrix. The CSV is rather large (10000x5) and is constructed as follows (the acronyms for the column vectors arn't super important I suppose, but I'll include them) : name of subject, Blood Pressure, PDAC, GSIC, TDAP

What I would like to do is take this create a list of numpy matrices such that each matrix contains the values associated with a unique subject name, as a simple example Edit( at suggestion I changed the "subject name column" to "subject id" by creating a mapping from names to id. In this example carl has id 1, and doug has id 2):

Original=np.matrix['1 17 28 32 79; 1 89 72 46 22; 1 91 93 88 90; 2 21 57 73 68; 2 43 32 21 22']

Carl = np.matrix['1 17 28 32 79; 1 89 72 46 22; 1 91 93 88 90']
Doug = ['2 21 57 73 68 ;2 43 32 21 22']

matrixlist = [ Doug, Carl]

For a few matrices this wouldn't be too tough of a problem-but there are a lot of subjects spread out in the parent csv, and not every subject has the same number of entries. I have tried converting all the data into a list and then using list comprehension but I'm running into some issues.

Lastly, I was wondering if there was a way to apply a function to each element in the list of matrices. As another simple example: I wrote a function that computes the correlation matrix of a numpy array using its svd. Is it possible to apply it to every element in the list?

def correlation_matrix(x):
    covariance_matrix = np.cov(x, y=None, rowvar=False, bias=False, ddof=None, fweights=None, aweights=None)
    correlation_matrix =np.matmul(np.matmul(fractional_matrix_power(np.diag(np.diag(covariance_matrix)),-1/2),covariance_matrix),(fractional_matrix_power(np.diag(np.diag(covariance_matrix)),-1/2)))
    return correlation_matrix

thanks in advance!

4
  • Please post coherent example data. As it stands, your examples are not valid Python literals, and seem to imply recursive data structures. Please try to be clear. What exactly are you expecting as output? What exactly is your input? You say it is a numpy.array, but what exactly is the structure of that array? And why did you make an array in the first place? (It seems you should be working with plain lists or maybe a pandas.DataFrame given your examples) Commented Jun 5, 2017 at 19:26
  • no worries. I just created a function that turns names into ids. Let me re-format. I would like to turn a larger numpy matrix into a list of numpy sub matrices according to the subject id. Commented Jun 5, 2017 at 19:31
  • What? I understand that you "want to turn a larger numpy matrix into a list of numpy submatrices" according to some id, my question is, what exactly are you dealing with? Hopefully, you aren't actually using a numpy.matrix but some plain np.ndarray. But really, you should't be using that if you have strings/numbers in your data. You should probably just stick to Python lists. Or a pandas.DataFrame. Commented Jun 5, 2017 at 19:33
  • Oh-my mistake. I apologize. and yes- I did mean np.ndarry. I apologize-it's been a long morning as is. I ended up just making a mapping from the string values to integers to create an array of numerical data only. all dtypes are the same now. I haven't really used pandas much-perhaps it's the time to move that way! Commented Jun 5, 2017 at 19:39

1 Answer 1

1

Good evening. A very nice way to do this is to use pandas DataFrame. To read your data and to sort for subjects, do the following:

import pandas as pd
my_df = pd.read_csv(your_filename, names=['subject','0','1','2','3'])
grouped_output = my_df.groupby('subject').get_group('Carl')

This will return just the Carl Data from your DataFrame. After this you could loop through all group subjects and do whatever you'd like to do with them. A loop could look like this:

for key, subject in my_df.groupby('subject').groups.items():
    print(my_df.groupby('subject').get_group(subject))
Sign up to request clarification or add additional context in comments.

2 Comments

thanks for the reply! This looks powerful! How do you mean loop through all group subjects though? Syntactically speaking is it similar to appending items from a set? If it's not too much trouble could you provide a short example? Thanks so much!
The my_df.groupby('subject').groups gives you a dictionary containing all groups. So you could loop through this dictionary and get the group names like edited in the above code

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.