Python pandas - extract and replace

Question

I have a Pandas data frame column containing elements similar to the string McNally, King (XYZ). I would like to keep the last name, first name and remove everything else. Therefore after cleaning McNally, King (XYZ) should be McNally, King.

I have tried following two functions but not getting the desired result:

df['name'] = df['name'].str.extract(r'\w+\,\s[A-Z][a-z]+', expand=False)

df['name'] = df['name'].replace({r'\w+\,\s[A-Z][a-z]+' : r'\w+\,\s[A-Z][a-z]+'}, regex=True)

Second code replaces the substring with the regex itself, while the first code extracts the names from the string but I want to keep the name and remove everything else followed by the name.

Edit: Sample data:

Reyes, Rebecca  L (XYZ)
Derry, Odd     P (XYZ)
Garza, Per-Laura   A (MNP)
Fernandez, Rafael   Carl (XYZ)

Expected output:

Reyes, Rebecca
Derry, Odd
Garza, Per-Laura
Fernandez, Rafael

I would like to edit-in-place i.e. modify the existing datafame itself and not creating a new one.

Data is in a CSV file which I am reading using pandas.read_csv as dataframe then doing the cleanup. — ravi
– ravi, Commented Nov 16, 2017 at 15:49
So where is it? We want to see it, along with your expected output. — cs95
– cs95, Commented Nov 16, 2017 at 15:49

Scott Boston · Accepted Answer · 2017-11-16 15:49:17Z

2

You can try something like this:

df = pd.DataFrame({'name':['McNally, King  (XYZ)']}, index=[0])
df['name'].str.extract(r'(\w+,\s\w+)')

Output:

0    McNally, King
Name: name, dtype: object

answered Nov 16, 2017 at 15:49

Scott Boston

154k15 gold badges160 silver badges207 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Python pandas - extract and replace

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related