2

I need to limit a dataset so that it returns only rows that contain specific string, however, that string can exist in many (8) of the columns.

How can I do this? Ive seen str.isin methods, but it returns a single series for a single row. How can I remove any rows that contain the string in ANY of the columns.

Example code If I had the dataframe df generated by

 import pandas as pd
    data = {'year': [2011, 2012, 2013, 2014, 2014, 2011, 2012, 2015], 
        'year2': [2012, 2016, 2015, 2015, 2012, 2013, 2019, 2016],
        'reports': [52, 20, 43, 33, 41, 11, 43, 72]}
    df = pd.DataFrame(data, index = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])
    df    

   year  year2  reports
a  2011   2012       52
b  2012   2016       20
c  2013   2015       43
d  2014   2015       33
e  2014   2012       41
f  2011   2013       11
g  2012   2019       43
h  2015   2016       72

I want the code to remove rows all rows that do not contain the value 2012. Note that in my actual dataset, it is a string, not an int (it is peoples names) so in the above code it would remove rows c, d, f, and h.

4
  • that contain the string, what string? Commented Jan 10, 2020 at 17:55
  • You mean this? df[~df.index.isin(['c', 'd', 'f', 'h'])] Commented Jan 10, 2020 at 17:57
  • Editted the post to be more specific, no I am not trying to drop known rows. The actual dataset is almost 80,000 rows and I need to filter to only find data involved with a single person, whose name may be contained in 8 possible rows Commented Jan 10, 2020 at 17:57
  • Take a look at stackoverflow.com/a/35682788/12411517. You can remove using ~ or compare inequality. Commented Jan 10, 2020 at 18:02

3 Answers 3

9

you can use df.eq with df.any on axis=1:

df[df.eq('2012').any(1)] #for year as string

Or:

df[df.eq(2012).any(1)] #for year as int

   year  year2  reports
a  2011   2012       52
b  2012   2016       20
e  2014   2012       41
g  2012   2019       43
Sign up to request clarification or add additional context in comments.

5 Comments

This code works, ALTHOUGH to any future readers please note that the answer snippet searches for a string (which is what my real dataset has) so it will not return the correct results on my example code where 2012 is an int
Ill be accepting this answer when the time allows, thank you so much for the quick response and the edits
To search for '2012' in specific columns only, use: df[df.loc[:, columns].eq('2012').any(1)], given that columns is a list of columns in which to search (e.g. columns = ['year', 'year2'])
How would you go about modifying this to drop any rows which do not contain a substring. (rows in which no value satisfies 'string' is in values) say for example I have a large dataset with names but I want to return all rows which contain the name george, but that may include different last names (for example, column 3 may be george foreman or george brazil, but i want both returned)
@AlbinoRhino may be df.apply(lambda x: x.str.contains('george')).any(1) ? if not that does call for a different question since this is asking for a substring match and not an exact match
0

try simple code like this:

import pandas as pd
data = {'year': [2011, 2012, 2013, 2014, 2014, 2011, 2012, 2015], 
'year2': [2012, 2016, 2015, 2015, 2012, 2013, 2019, 2016],
'reports': [52, 20, 43, 33, 41, 11, 43, 72]}
df = pd.DataFrame(data, index = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])
df = df.drop(['c', 'd', 'f', 'h'])

df  

it will give you dataframe like this:

   year  year2  reports
a  2011   2012       52
b  2012   2016       20
e  2014   2012       41
g  2012   2019       43

1 Comment

I am not dropping known rows, but instead finding rows which lack a certain value in any of their columns, then removing them
0

To find the dataframe made of the rows that have the value

df[(df == '2012').all(axis=1)]

To find the dataframe made of the rows that do not have the value

df[~(df == '2012').all(axis=1)]

or

df[(df != '2012').all(axis=1)]

See the related https://stackoverflow.com/a/35682788/12411517.

2 Comments

dataframe objects cannot use contains. If you edit this post so that it doesnt have that ill remove the downvote of course. Also if you can find a working method id give the upvote.
@AlbinoRhino sorry about that. removed.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.