1

How do you scan if a pandas dataframe row contains a certain substring?

for example i have a dataframe with 11 columns all the columns contains names

ID    name1     name2       name3      ...    name10
-------------------------------------------------------
AA    AA_balls  AA_cakee1  AA_lavender ...   AA_purple
AD    AD_cakee  AD_cats    AD_webss    ...   AD_ballss
CS    CS_cakee  CS_cats    CS_webss    ...   CS_purble
.
.
.

I would like to get rows which contains, say "ball" in the dataframe and get the ID

so the result would be ID 'AA' and ID 'AD' since AA_balls and AD_ballss are in the rows.

I have searched on google but seems there is no specific result for these. people usually ask questions about searching substring in a specific columns but not all columns (a single row)

df[df["col_name"].str.contains("ball")]

The Methods I have thought of are as follows, you can skip this if you have little time:

(1) loop through the columns

for col_name in col_names:
     df.append(df[df[col_name].str.contains('ball')])

and then drop duplicates rows which have same ID values but this method would be very slow

(2) Make data frame to a 2 column dataframe by appending name2- name10 columns into one column and use df[df["concat_col"].str.contains("ball")]["ID] to get the IDs and drop duplicate

ID  concat_col   
AA    AA_balls 
AA    AA_cakeee
AA    AA_lavender
AA    AA_purple
 .
 .
 .
CS   CS_purble

(3) Use the dataframe like (2) to make a dictionay where

 dict[df["concat_col"].value] = df["ID"]

then get the

[value for key, value in programs.items() if 'ball' in key()]

but in this method i need to loop through dictionary and become slow

if there is a method that i can apply faster without these processes, i would prefer doing so. If anyone knows about this, would appreciate a lot if you kindly let me know:) Thanks!

5
  • What is size of DataFrame? Commented Mar 16, 2018 at 7:10
  • not so big, df.shape is near (4000, 13) but i have done a lot of preprocessing in my programming process, would like to search for less time-consuimg methods Commented Mar 16, 2018 at 7:13
  • OK, give me some time for timings Commented Mar 16, 2018 at 7:15
  • Hmmm, also timings depends of how many matches obviously - what do you think? 50% of rows? Or something else? Commented Mar 16, 2018 at 7:16
  • thanks for your answers below. let me try out and reply to you. the matches would be few, below 15 rows. Commented Mar 16, 2018 at 7:17

2 Answers 2

1

One idea is use melt:

df = df.melt('ID')

a = df.loc[df['value'].str.contains('ball'), 'ID']
print (a)
0     AA
10    AD
Name: ID, dtype: object

Another:

df = df.set_index('ID')
a = df.index[df.applymap(lambda x: 'ball' in x).any(axis=1)]

Or:

mask = np.logical_or.reduce([df[x].str.contains('ball', regex=False) for x in df.columns])
a = df.loc[, 'ID']

Timings:

np.random.seed(145)
L = list('abcdefgh')
df = pd.DataFrame(np.random.choice(L, size=(4000, 10)))
df.insert(0, 'ID', np.arange(4000).astype(str))
a = np.random.randint(4000, size=15)
b = np.random.randint(1, 10, size=15)
for i, j in zip(a,b):
    df.iloc[i, j] = 'AB_ball_DE'
#print (df)


In [85]: %%timeit
    ...: df1 = df.melt('ID')
    ...: a = df1.loc[df1['value'].str.contains('ball'), 'ID']
    ...: 
10 loops, best of 3: 24.3 ms per loop

In [86]: %%timeit
    ...: df.loc[np.logical_or.reduce([df[x].str.contains('ball', regex=False) for x in df.columns]), 'ID']
    ...: 
100 loops, best of 3: 12.8 ms per loop

In [87]: %%timeit
    ...: df1 = df.set_index('ID')
    ...: df1.index[df1.applymap(lambda x: 'ball' in x).any(axis=1)]
    ...: 
100 loops, best of 3: 11.1 ms per loop
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks your answer works very well!! did not know the method melt before. and need to learn more about lambda an applymap...
1

Maybe this might work?

mask = df.apply(lambda row: row.map(str).str.contains('word').any(), axis=1)
df.loc[mask]

Disclaimer: I haven't tested this. Perhaps the .map(str) isn't necessary.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.