Extract prefix from string in dataframe column where exists in a list

Question

Looking for some help. I have a pandas dataframe column and I want to extract the prefix where such prefix exists in a separate list.

pr_list = ['1 FO-','2 IA-']

Column in df is like

PartNumber     
ABC
DEF
1 FO-BLABLA
2 IA-EXAMPLE

What I am looking for is to extract the prefix where present, put in a new column and leave the rest of the string in the original column.

PartNumber   Prefix
ABC          
DEF
BLABLA       1 FO-
EXAMPLE      2 IA-

Have tried some things like str.startswith but a bit of a python novice and wasn't able to get it to work.

much appreciated

EDIT Both solutions below work on the test data, however I am getting an error
error: nothing to repeat at position 16
Which suggests something askew in my dataset. Not sure what position 16 refers to but looking at both the prefix list and PartNumber column in position 16 nothing seems out of the ordinary?

EDIT 2 I have traced it to have an * in the pr_list seems to be throwing it. is * some reserved character? is there a way to break it out so it is read as text?

Are all the prefix ending with '-'? In this case you could try to play from df["PartNumber"].str.split("-") — rpanai
– rpanai, Commented Mar 14, 2019 at 4:46
No unfortunately, can be anything from a single digit, to special characters like *SV, which is why I went with the list route — Philip Hutchinson
– Philip Hutchinson, Commented Mar 14, 2019 at 4:58

anky · Accepted Answer · 2019-03-14 05:00:55Z

1

You can try:

df['Prefix']=df.PartNumber.str.extract(r'({})'.format('|'.join(pr_list))).fillna('')
df.PartNumber=df.PartNumber.str.replace('|'.join(pr_list),'')
print(df)

  PartNumber Prefix
0        ABC       
1        DEF       
2     BLABLA  1 FO-
3    EXAMPLE  2 IA-

answered Mar 14, 2019 at 5:00

anky

75.3k11 gold badges46 silver badges76 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Philip Hutchinson Over a year ago

Hi Anky, again your solution works on the test data but I am getting the error: nothing to repeat at position 16 problem so it must be my data. Unsure of this error if you have any idea?

YusufUMS · Accepted Answer · 2019-03-14 05:05:51Z

0

Maybe it's not what you are looking for, but may it help.

import pandas as pd

pr_list = ['1 FO-','2 IA-']
df = pd.DataFrame({'PartNumber':['ABC','DEF','1 FO-BLABLA','2 IA-EXAMPLE']})

extr = '|'.join(x for x in pr_list)
df['Prefix'] = df['PartNumber'].str.extract('('+ extr + ')', expand=False).fillna('')
df['PartNumber'] = df['PartNumber'].str.replace('|'.join(pr_list),'')
df

edited Mar 14, 2019 at 5:05

answered Mar 14, 2019 at 4:59

YusufUMS

1,4931 gold badge14 silver badges24 bronze badges

1 Comment

Philip Hutchinson Over a year ago

Your solution works great on the sample data I provided, however when I apply it to the full dataset, I get an error error: nothing to repeat at position 16 This something to do with my data?

Collectives™ on Stack Overflow

Extract prefix from string in dataframe column where exists in a list

2 Answers 2

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related