I need to limit a dataset so that it returns only rows that contain specific string, however, that string can exist in many (8) of the columns.
How can I do this? Ive seen str.isin methods, but it returns a single series for a single row. How can I remove any rows that contain the string in ANY of the columns.
Example code If I had the dataframe df generated by
import pandas as pd
data = {'year': [2011, 2012, 2013, 2014, 2014, 2011, 2012, 2015],
'year2': [2012, 2016, 2015, 2015, 2012, 2013, 2019, 2016],
'reports': [52, 20, 43, 33, 41, 11, 43, 72]}
df = pd.DataFrame(data, index = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])
df
year year2 reports
a 2011 2012 52
b 2012 2016 20
c 2013 2015 43
d 2014 2015 33
e 2014 2012 41
f 2011 2013 11
g 2012 2019 43
h 2015 2016 72
I want the code to remove rows all rows that do not contain the value 2012. Note that in my actual dataset, it is a string, not an int (it is peoples names)
so in the above code it would remove rows c, d, f, and h.
that contain the string, what string?df[~df.index.isin(['c', 'd', 'f', 'h'])]