How can I replace sub-strings of a large string based on a dictionary in Python?

Question

I have a long string (a "template") containing "replacement points" in the form of %MARK% (there can be more occurences in the string for a single given marker too). I want to replace these markers, controlled by a Python dictionary (it does not contain the % signs for markers), like:

rep_dict = { "TITLE": "This is my title", "CONTENT": "Here it is the content" }

The problem: simple call of replace() method one by one is not a good solution: the previous replacement may contain one of these marks, which then must not be replaced!

The solution should be fast enough, since I have large templates, and I need to replace many of them within a big loop. I have a very ugly and long implementation with many find()'s, counting offsets in the original string during the replacament process, etc. I have the hope that there is a much nicer, more compact, and quicker solution.

Pity you used %MARK% instead of {MARK} because you could have used conventional string formating with your dictionary. Also with ${MARK} or $MARK you could have used string templates. — joaquin
– joaquin, Commented Oct 12, 2011 at 10:52
@joaquin did you mean %(MARK) or {MARK}? The %... notation is deprecated, and the {...} one requires him to double plain-text curly braces: {{these braces make it to the output string}}, {these do not}. — pyos
– pyos, Commented Oct 12, 2011 at 10:53
No I was not refering to interpolation with % (btw it has no date of disappearance yet, despite what was said) but string.Template. I reedited and completed my comment — joaquin
– joaquin, Commented Oct 12, 2011 at 10:58

pyos · Accepted Answer · 2011-10-12 10:26:18Z

3

The easiest solution is

import re
re.sub(r'%(.+?)%', lambda m: rep_dict[m.group(1)], YOUR_TEMPLATE)

Not fast enough? Someone said 'do not use regex' and you obey? Parsing your template using some code in Python would be even more complex and slow (don't forget, re is written in C).

answered Oct 12, 2011 at 10:26

pyos

1514 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

LGB Over a year ago

Thanks, well, I didn't want to avoid regexps by intent (it's more like the case that I am somewhat beginner in Python) ... This is a nice and "elegant" solution, however there is a little problem with it: template may contain a marker, which is not in the rep_dict. In this case this solution produces an exception. I would need the unmodified %...% marker in the return string if there is no replacement information for that in the rep_dict.

pyos Over a year ago

If you want to ignore invalid markers, you should use rep_dict.get(m.group(1), m.group()) instead of rep_dict[m.group(1)]. docs.python.org/library/stdtypes.html#dict.get

LGB Over a year ago

Till now, this is my best solution (with more implementation details inside): def _replace_by_dict(rep, s): s = re.split("(%[A-Z0-9]{1,32}%)", s) for a, b in rep.items(): for c in range(len(s)): if s[c] == "%" + a + "%": s[c] = b return "".join(s) It was written since I've asked the question and before I read your answer. But your solution is much more elegant, just I have problems with the "unhandled markers", as I've mentioned.

urschrei Over a year ago

You could just catch the KeyError and pass

LGB Over a year ago

Nice, your comment helped me to understand this now. Thank you for your answer/comments!

|

jogojapan · Accepted Answer · 2012-10-28 02:56:09Z

0

This was excellent. I have always used the excuse of not having time to learn RegEx, but always respected it. This post gave me the necessary to get started. This was my solution though, I found the group call was mixed up in the dictionary parameters:

retVal          = re.sub(r'%title', theTitle, template)
retVal          = re.sub(r'%([a-z]+?)+', \
                    lambda m: myDict.get(m.group(0)[1:], ''), retVal)

title was not in the dictionary, that is why I did it first. Requrements of the others in the team.

edited Oct 28, 2012 at 2:56

jogojapan

70.5k11 gold badges110 silver badges136 bronze badges

answered Mar 20, 2012 at 22:21

Jason Thorne

1

Collectives™ on Stack Overflow

How can I replace sub-strings of a large string based on a dictionary in Python?

2 Answers 2

6 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related