2

I am working with pandas and a rather large excel document. My goal is to find and replace particular characters in a string and replace them with nothing, essentially removing the characters. The strings are in a particular column. Below you will see the code that I have created to find and replace, however python is not giving me an error message, and when I checked the saved file nothing has changed. What am I doing wrong?

import pandas as pd

df1 = pd.read_csv('2020.csv')

(df1.loc[(df1['SKU Code'].str.contains ('-DG'))])

dfDGremoved = (df1.loc[(df1['SKU Code'].str.contains('-DG'))].replace('-DG',''))

dfDGremoved.to_csv('2020DRAFT.csv')
2
  • 1
    Why check to see if the string contains what you're replacing. Just replace it first. Does this not work: df1['SKU Code'] = df1['SKU Code'].replace('-DG', ''). and then just df1.to_csv('2020DRAFT.csv') Commented Mar 3, 2020 at 20:12
  • 1
    The line (df1.loc[(df1['SKU Code'].str.contains ('-DG'))]) doesn't have any effect. Commented Mar 3, 2020 at 20:27

2 Answers 2

1

Your code is a bit overengineered, Python's replace method ignores strings which do not contain the substring you want to replace, so the contains call is unnecessary. Creating a second dataframe is also unnecessary, pandas can deal with in-place substitutions.

To achieve the result you want, you can use a map, which applies a function to every element in a Series (which a single column from a DataFrame is), combined to a lambda function:

df1 = pd.read_csv('2020.csv')
df1['SKU Code'] = df1['SKU Code'].map(lambda x: x.replace('-DG', '')
df1.to_csv('2020DRAFT.csv')

Unpacking this a bit:

df1['SKU Code'] = df1['SKU Code'].map(lambda x: x.replace('-DG', '')
  |                     |          |         └─ Create a nameless function which 
  |                     |          |            takes a string and removes '-DG'
  |                     |          |            from it 
  |                     |          |
  |                     |          └─ ...and run this function on every element...
  |                     |
  |                     └─ ... of the 'SKU Code' column in df1...
  |
  └── ... Then store the results in that same column
Sign up to request clarification or add additional context in comments.

Comments

1

You can use pandas.Series.str.replace(). It performs regex replace.

dfDGremoved = df1.copy()
dfDGremoved['SKU Code'] = dfDGremoved['SKU Code'].str.replace('-DG','')
dfDGremoved.to_csv('2020DRAFT.csv')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.