1

I'm trying to take the value in a column in pandas and remove that value from another column. However - the replace behavior is not working the way I would have expected.

In this example, I am trying to make the value in col2 equal to 'something'

import pandas as pd  

#Build the dataframe
col1 = ['ABC', 'DEF']
col2 = ['something - ABC', 'something - DEF']
df1 = pd.DataFrame(['ABC', 'DEF'], columns = ['col1'])
df2 = pd.DataFrame(['something - ABC', 'something - DEF'], columns = ['col2'])
df = df1.join(df2, on=None, how='left')

#Replace ' - ABC' so column is just 'something'
df['newcolumn'] = df.col2.replace(' - '+df.col1, '')

This is returning the value that's already in col2. What am I missing?

4
  • Will the pattern always be - followed by something you want to replace? Furthermore, will this pattern always remain at the end? Commented Oct 17, 2017 at 1:55
  • @cᴏʟᴅsᴘᴇᴇᴅ trying to make it so it complete removes the value of col1 from col2. Strip in this case will remove it, but trying to match it specifically. Commented Oct 17, 2017 at 2:05
  • Afraid you're out of luck, replace doesn't work like that. The only thing you could do is get a list of all unique values and create a massive regex and call str.replace. Which I guarantee will be slow. Commented Oct 17, 2017 at 2:08
  • See my answer for details. You're also free to do timing comparisons yourself, you'll see what I mean. Commented Oct 17, 2017 at 2:12

3 Answers 3

1

By using str.split

df['newcolumn']=df.col2.str.split(' -',expand=True)[0]
df
Out[136]: 
  col1             col2   newcolumn
0  ABC  something - ABC   something 
1  DEF  something - DEF   something 
Sign up to request clarification or add additional context in comments.

1 Comment

Note that I used rsplit instead of split as I assume the pattern to remove comes after the last hyphen. Your code assumes the opposite (just thought you should make that clear).
1

You could use str.rsplit:

df['newcolumn'] = df.col2.str.rsplit('-', 1).str[0]
print(df)
  col1             col2  newcolumn
0  ABC  something - ABC  something
1  DEF  something - DEF  something

One big assumption here is that your pattern to remove succeeds the last hyphen in the string.


Another possibility using str.replace with regex.

df['newcolumn'] = df.col2.str.replace('-[^-]*$', '')
print(df)
  col1             col2  newcolumn
0  ABC  something - ABC  something
1  DEF  something - DEF  something

Yet another possibility, with str.replace would be to retrieve all unique values from col1 and create a massive regex (more focused than the above approaches, but also much slower).

df['newcolumn'] = df.col2.str.replace(r'\s*\-\s*({})'.format('|'.join(vals)), '')
print(df)
  col1             col2  newcolumn
0  ABC  something - ABC  something
1  DEF  something - DEF  something

if col1 has strings separated by space, you'll need to wrap each one inside their own parens, so use this:

df.col2.str.replace(r'\s*\-\s*(({}))'.format(')|('.join(vals)), '')

Comments

0

If you want to replace pattern based on column 1, this would work independently of the delimiter:

import pandas as pd  

#Build the dataframe
col1 = ['ABC', 'DEF']
col2 = ['something - ABC', 'something - DEF']
df1 = pd.DataFrame(['ABC', 'DEF'], columns = ['col1'])
df2 = pd.DataFrame(['something - ABC', 'something - DEF'], columns = ['col2'])
df = df1.join(df2, on=None, how='left')

#Replace ' - ABC' so column is just 'something'
df['newcolumn'] = df.apply(lambda x: str(x.col2).replace(' - ' + str(x.col1), '') ,axis = 1)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.