4

I have a Pandas DataFrame (df) where some of the words contain encoding replacement characters. I want to replace these words with replacement words from a dictionary (translations).

translations = {'gr�nn': 'gronn', 'm�nst': 'menst'}
df = pd.DataFrame(["gr�nn Y", "One gr�nn", "Y m�nst/line X"])

df.replace(translations, regex=True, inplace=True)

However, it doesn't seem to capture all the instances. Current output:

                0
0         gronn Y
1       One gr�nn
2  Y m�nst/line X

Do I need to specify any regex patterns to enable the replacement to also capture partial words within a string?

Expected output:

                0
0         gronn Y
1       One gronn
2  Y menst/line X
3
  • 1
    if you have just gronn, i suggest you to replace all gr.nn by gronn Commented Mar 4, 2019 at 10:50
  • @Frenchy This is just a sample set - there full set contains multiple variations. Commented Mar 4, 2019 at 10:54
  • if you have some replaced and other not replaced, that mean is different character Commented Mar 4, 2019 at 11:15

1 Answer 1

6

Turn your translations into regex find/replace strings:

translations = {r'(.*)gr�nn(.*)': r'\1gronn\2', r'(.*)m�nst(.*)': r'\1menst\2'}
df = pd.DataFrame(["gr�nn Y", "One gr�nn", "Y m�nst/line X"])
df.replace(translations, regex=True)

Returns:

    0
0   gronn Y
1   One gronn
2   Y menst/line X
Sign up to request clarification or add additional context in comments.

3 Comments

Very Nice idea. :) +1
you are supposing � is same.So � could hide different values (hex) behind its visual
@Frenchy I guess in that case you could use another group r'(.*)gr(.*)nn(.*)': r'\1gronn\3'

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.