I want to find columns in a dataframe that match a string pattern. I specifically want to find two parts, firstly find a column that contains "WORDABC" and then I want to find the column that also is the "1" value of that column (i.e. "WORDABC1"). To do this I have been using the .str.contains Pandas function.
My problem is when there are two numbers, such as "11" or "13".
df = pd.DataFrame({'WORDABC1': {0: 1, 1: 2, 2: 3},
'WORDABC11': {0: 4, 1: 5, 2: 6},
'WORDABC8N123': {0: 7, 1: 8, 2: 9},
'WORDABC81N123': {0: 10, 1: 11, 2: 12},
'WORDABC9N123': {0: 13, 1: 14, 2: 15},
'WORDABC99N123': {0: 16, 1: 17, 2: 18}})
Trying to search for the column that contains "WORDABC1" gives two results, "WORDABC1" and
df[df.columns[df.columns.str.contains(pat = 'WORDABC1')]]
WORDABC1 WORDABC11
0 1 4
1 2 5
2 3 6
df[df.columns[df.columns.str.contains(pat = 'WORDABC1\\b')]]
WORDABC1
0 1
1 2
2 3
For the above example, it works for me. However my problem happens if there are more characters after my found pattern.
df[df.columns[df.columns.str.contains(pat = 'WORDABC9')]]
WORDABC9N123 WORDABC99N123
0 13 16
1 14 17
2 15 18
df[df.columns[df.columns.str.contains(pat = 'WORDABC9\\b')]]
Empty DataFrame
Columns: []
Index: [0, 1, 2]
I only want the "WORDABC9N123" column, and I cannot just remove the other column. I have considered just using df[df.columns[df.columns.str.contains(pat = 'WORDABC9')][0]] to get the series I want, but that creates another issue.
I have also been using things such as (df.columns.str.contains(pat = 'WORDABC1\\b')).sum() to create truth statements, so the above df[0] method doesn't help me get through the issue.
Is there a better method to use instead of str.contains? Or is my regex just incorrect? Thank you!