filter CSV file with pandas

Question

I have a CSV file where each row holds some data about a particular patient and a single patient can have multiple rows associated with him or her.

The file itself contains thousands of patient records and what I want to do is randomly select 100 patients from the file and then get all records associated with them and then save them to another CSV file.

So, the file could look like, for example:

patient_id   Date          Diagnosis   Comments
001-001      23.12.2008    Normal      Normal
001-001      23.12.2009    Normal      Normal
001-002      08.11.2007    Normal      Normal
001-003
....

So, I can load the file as:

frame = pd.read_csv('file.csv')
# Get the unique subjects
unique_subjects = frame['patient_id'].unique()
# Use numpy to randomly select some patients
random_us = np.random.choice(unique_subjects, 100)

And then I can load the CSV and then check row by row and select which rows to write back to the CSV file.

I have a feeling pandas might provide something more direct and I wonder if there is a way to pipe all these operations with it.

Quang Hoang · Accepted Answer · 2019-09-15 23:09:48Z

1

You can use isin to filter those id needed:

random_records = frame[frame['patient_id'].isnin(random_us)]

answered Sep 15, 2019 at 23:09

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

filter CSV file with pandas

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related