How do you scan if a pandas dataframe row contains a certain substring?
for example i have a dataframe with 11 columns all the columns contains names
ID name1 name2 name3 ... name10
-------------------------------------------------------
AA AA_balls AA_cakee1 AA_lavender ... AA_purple
AD AD_cakee AD_cats AD_webss ... AD_ballss
CS CS_cakee CS_cats CS_webss ... CS_purble
.
.
.
I would like to get rows which contains, say "ball" in the dataframe and get the ID
so the result would be ID 'AA' and ID 'AD' since AA_balls and AD_ballss are in the rows.
I have searched on google but seems there is no specific result for these. people usually ask questions about searching substring in a specific columns but not all columns (a single row)
df[df["col_name"].str.contains("ball")]
The Methods I have thought of are as follows, you can skip this if you have little time:
(1) loop through the columns
for col_name in col_names:
df.append(df[df[col_name].str.contains('ball')])
and then drop duplicates rows which have same ID values but this method would be very slow
(2) Make data frame to a 2 column dataframe by appending name2- name10 columns into one column and use df[df["concat_col"].str.contains("ball")]["ID] to get the IDs and drop duplicate
ID concat_col
AA AA_balls
AA AA_cakeee
AA AA_lavender
AA AA_purple
.
.
.
CS CS_purble
(3) Use the dataframe like (2) to make a dictionay where
dict[df["concat_col"].value] = df["ID"]
then get the
[value for key, value in programs.items() if 'ball' in key()]
but in this method i need to loop through dictionary and become slow
if there is a method that i can apply faster without these processes, i would prefer doing so. If anyone knows about this, would appreciate a lot if you kindly let me know:) Thanks!
50%of rows? Or something else?