Update pandas DataFrame with .str.replace() vs .replace()

Question

I have a column in my pandas Dataframe df that contains a string with some trailing hex-encoded NULLs (\x00). At least I think that it's that. When I tried to replace them with:

df['SOPInstanceUID'] = df['SOPInstanceUID'].replace('\x00', '')

the column is not updated. When I do the same with

df['SOPInstanceUID'] = df['SOPInstanceUID'].str.replace('\x00', '')

it's working fine. What's the difference here? (SOPInstanceUID is not an index.)

thanks

EdChum · Accepted Answer · 2016-06-30 07:52:56Z

11

The former looks for exact matches, the latter looks for matches in any part of the string, which is why the latter works for you.

The str methods are synonymous with the standard string equivalents but are vectorised

answered Jun 30, 2016 at 7:52

EdChum

397k204 gold badges837 silver badges583 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Bowen Liu Over a year ago

Not OP but thank you for the info. Just a silly question, what you mean by vectorised here?

EdChum Over a year ago

@BowenLiu vectorised here means instead of operating on a single row or value at a time, we operate on the entire column (although in practice it really means multiple values) so it's significantly faster

Bowen Liu Over a year ago

Thanks a lot your explanation. So it can operate on multiple values at once so it can save computation time?

EdChum Over a year ago

@BowenLiu correct vectorization is in my opinion why you should be using numpy or pandas. Otherwise it's just a fancy data structure that makes indexing easier without any performance gain

Bowen Liu Over a year ago

Amazing! I never thought about the reasons behind using pandas and numpy for data handling. I just use it because everyone uses it and it has so many useful functions. But the reason for these functions to work well and fast is that they vectorize all the data? Could you explain in layman's terms how it could do it please? I always thought it iterates through objects one by one just like for loops.

|

SerialDev · Accepted Answer · 2016-06-30 08:40:46Z

2

You did not specify a regex or require an exact match, hence str.replace worked

str.replace(old, new[, count])

Return a copy of the string with all occurrences of substring old replaced by new. If the optional argument count is given, only the first count occurrences are replaced.

DataFrame.replace(to_replace=None, value=None, inplace=False, limit=None, regex=False, method='pad', axis=None)

parameter: to_replace : str, regex, list, dict, Series, numeric, or None

str or regex: str: string exactly matching to_replace will be replaced with value regex: regexs matching to_replace will be replaced with value

They're not actually in the string: you have unescaped control characters, which Python displays using the hexadecimal notation:

remove all non-word characters in the following way:

re.sub(r'[^\w]', '', '\x00\x00\x00\x08\x01\x008\xe6\x7f')

edited Jun 30, 2016 at 8:40

answered Jun 30, 2016 at 7:56

SerialDev

2,84722 silver badges34 bronze badges

4 Comments

landge Over a year ago

Ok, thanks to both of you. But when I call replace like this codedf['SOPInstanceUID'].replace('\x00', '')code I get the string back without trailing NULLs!? So, it seems to match, or is it just som kind of output formatting that doesn't show the NULLs?

EdChum Over a year ago

you'll need to post raw data and code that demonstrates this, also your comment contradicts your question statement in that it didn't work

landge Over a year ago

Yes, sorry. I ment when I call the method without assigning back to the column I get a string output in jupyter without the trailing NULLs. When assigning as in my post - nothing happens. Confusing.

landge Over a year ago

CMari, thanks. That was the missing part! I don't understand it thoroughly, but I'll try.

Collectives™ on Stack Overflow

Update pandas DataFrame with .str.replace() vs .replace()

2 Answers 2

8 Comments

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

8 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related