1

I have a Pandas data frame column containing elements similar to the string McNally, King (XYZ). I would like to keep the last name, first name and remove everything else. Therefore after cleaning McNally, King (XYZ) should be McNally, King.

I have tried following two functions but not getting the desired result:

df['name'] = df['name'].str.extract(r'\w+\,\s[A-Z][a-z]+', expand=False)

df['name'] = df['name'].replace({r'\w+\,\s[A-Z][a-z]+' : r'\w+\,\s[A-Z][a-z]+'}, regex=True)

Second code replaces the substring with the regex itself, while the first code extracts the names from the string but I want to keep the name and remove everything else followed by the name.

Edit: Sample data:

Reyes, Rebecca  L (XYZ)
Derry, Odd     P (XYZ)
Garza, Per-Laura   A (MNP)
Fernandez, Rafael   Carl (XYZ)

Expected output:

Reyes, Rebecca
Derry, Odd
Garza, Per-Laura
Fernandez, Rafael

I would like to edit-in-place i.e. modify the existing datafame itself and not creating a new one.

3
  • Where's your data? Commented Nov 16, 2017 at 15:46
  • Data is in a CSV file which I am reading using pandas.read_csv as dataframe then doing the cleanup. Commented Nov 16, 2017 at 15:49
  • So where is it? We want to see it, along with your expected output. Commented Nov 16, 2017 at 15:49

1 Answer 1

2

You can try something like this:

df = pd.DataFrame({'name':['McNally, King  (XYZ)']}, index=[0])
df['name'].str.extract(r'(\w+,\s\w+)')

Output:

0    McNally, King
Name: name, dtype: object
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.