Pandas .loc[].index

Question

What is the most efficient way (using the least amount of lines possible) to locate and drop multiple strings in a specified column?

Information regarding the .tsv dataset that may help:

'tconst' = movie ID

'region' = region in which the movie was released in

'language' = language of movie

Here is what I have right now:

origin.drop(origin.columns[[1,2,5,6,7]], axis=1, inplace=True)
origin.columns = ['tconst','region','language']
origin.drop(origin.loc[origin['region']!=('US')].index, inplace=True)

I am trying to drop all rows under the 'region' column that contain a string value abbreviation of countries that do not speak English. For example, drop all rows under region that are not equal to 'US','UK','AUS',etc. I have tried the & and or operands within the parenthesis and they tend to select and drop only one of the string values that I place inside.

ADDITIONAL QUESTION:

The 'language' column contains numerous amounts of null values (I don't really care about that), however there are some rows that contain 'en' for English. IF the 'region' happens to be a non-English speaking country BUT the language is in English, how do I prevent those rows from being removed as well?

S F · Accepted Answer · 2020-11-01 22:25:15Z

1

Assuming I understood your question correctly and you want to drop the entire row if your specified conditions holds, you can write them as masks like

cond1 = origin['region'].isin(['US', 'UK', 'AUS'])
cond2 = origin['language'] == 'en'

Combining them (you want at least one of the conditions to be true in order to keep a row) with | you get

result = origin[cond1 | cond2]

Of course, this can also be written in a single line if you care to do so:

result = origin[(origin['region'].isin(['US', 'UK', 'AUS'])) | (origin['language'] == 'en')]

edited Nov 1, 2020 at 22:25

answered Nov 1, 2020 at 21:56

S F

3,17421 silver badges31 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

cloud_user Over a year ago

This works. Very much appreciated. Dropped the DT from 2mil+ entries to barely over 100K haha

Collectives™ on Stack Overflow

Pandas .loc[].index

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related