0

I have a CSV file where each row holds some data about a particular patient and a single patient can have multiple rows associated with him or her.

The file itself contains thousands of patient records and what I want to do is randomly select 100 patients from the file and then get all records associated with them and then save them to another CSV file.

So, the file could look like, for example:

patient_id   Date          Diagnosis   Comments
001-001      23.12.2008    Normal      Normal
001-001      23.12.2009    Normal      Normal
001-002      08.11.2007    Normal      Normal
001-003
....

So, I can load the file as:

frame = pd.read_csv('file.csv')
# Get the unique subjects
unique_subjects = frame['patient_id'].unique()
# Use numpy to randomly select some patients
random_us = np.random.choice(unique_subjects, 100)

And then I can load the CSV and then check row by row and select which rows to write back to the CSV file.

I have a feeling pandas might provide something more direct and I wonder if there is a way to pipe all these operations with it.

1 Answer 1

1

You can use isin to filter those id needed:

random_records = frame[frame['patient_id'].isnin(random_us)]
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.