I want to store a series of pre-tested regexes in a config file, and read and apply them at runtime.
However, because they're commonly packed with escape characters, by the time I've loaded them up into memory, and populated them into a dict, they've been escaped to death.
How can I preserve the integrity of my regex definitions, so that they will re.compile?
Alternately, given that many of the post-escape strings end up in a form with \x00 characters, how do I convert these back into a form that will be consumed correctly by re.compile?
e.g. I have written in a file, the regex "\btest\b". If I want to put this into a re.compile, I can force it to do so with re.compile(r"\btest\b"). However, I don't want to write this code by hand, I want to lift it from a file, and process it as a variable (I've got 000's of these to deal with here) .
There doesn't seem to be a way to r a string variable, and so I'm left trying to compile with '\x08test\x08', which doesn't do what I want it to.
This must be a fairly regular issue - how do others deal with this problem?
\btest\bas literal text in a file and then read it in, it will be equal tor'\btest\b'\btest\bwould be read as\\btest\\band wouldre.compile()just fine."\btest\b"(which re.compile accepts) but once it ends up in the "\08test\08" form (which re.compile does not - at least, it doesn't interpret that the same way) there seems to be no simple way to perform the reverse operation."\\btest\\b"then yes, that would work I guess - unfortunately, that doesn't seem to be what's happening.