1

Looking for some help. I have a pandas dataframe column and I want to extract the prefix where such prefix exists in a separate list.

pr_list = ['1 FO-','2 IA-']

Column in df is like

PartNumber     
ABC
DEF
1 FO-BLABLA
2 IA-EXAMPLE

What I am looking for is to extract the prefix where present, put in a new column and leave the rest of the string in the original column.

PartNumber   Prefix
ABC          
DEF
BLABLA       1 FO-
EXAMPLE      2 IA-

Have tried some things like str.startswith but a bit of a python novice and wasn't able to get it to work.

much appreciated

EDIT Both solutions below work on the test data, however I am getting an error
error: nothing to repeat at position 16
Which suggests something askew in my dataset. Not sure what position 16 refers to but looking at both the prefix list and PartNumber column in position 16 nothing seems out of the ordinary?

EDIT 2 I have traced it to have an * in the pr_list seems to be throwing it. is * some reserved character? is there a way to break it out so it is read as text?

2
  • Are all the prefix ending with '-'? In this case you could try to play from df["PartNumber"].str.split("-") Commented Mar 14, 2019 at 4:46
  • No unfortunately, can be anything from a single digit, to special characters like *SV, which is why I went with the list route Commented Mar 14, 2019 at 4:58

2 Answers 2

1

You can try:

df['Prefix']=df.PartNumber.str.extract(r'({})'.format('|'.join(pr_list))).fillna('')
df.PartNumber=df.PartNumber.str.replace('|'.join(pr_list),'')
print(df)

  PartNumber Prefix
0        ABC       
1        DEF       
2     BLABLA  1 FO-
3    EXAMPLE  2 IA-
Sign up to request clarification or add additional context in comments.

1 Comment

Hi Anky, again your solution works on the test data but I am getting the error: nothing to repeat at position 16 problem so it must be my data. Unsure of this error if you have any idea?
0

Maybe it's not what you are looking for, but may it help.

import pandas as pd

pr_list = ['1 FO-','2 IA-']
df = pd.DataFrame({'PartNumber':['ABC','DEF','1 FO-BLABLA','2 IA-EXAMPLE']})

extr = '|'.join(x for x in pr_list)
df['Prefix'] = df['PartNumber'].str.extract('('+ extr + ')', expand=False).fillna('')
df['PartNumber'] = df['PartNumber'].str.replace('|'.join(pr_list),'')
df

1 Comment

Your solution works great on the sample data I provided, however when I apply it to the full dataset, I get an error error: nothing to repeat at position 16 This something to do with my data?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.