1

I am having trouble with this one. I am trying to get a better handle on RE but it is not working. I have a list of strings that I want to erase if they are found in another string.

this is the exclusion list:

exclusionList = ['\+','of','<ET>f.','to','the','<L>L.</L>','f.','in','and','see','a','<L>Fr.</L>','as','<ET>ad.','<ET>a.','<PS>v.</PS></XR>',
             'from','<CF>ab</CF>','or','n.','<L>OFr.</L>','pple.','away','was','with','off,','pa.','on','is','cf.','stem','ad.','which',
             'by','action','ppl.','Cf.','but','<L>Gr.</L>','be','after','=','The','form','for','an','<XR><RX>prec.</RX></XR>',
             '<PS>a.</PS></XR>','<L>Eng.</L>','<PS>pref.</PS>','also','L.</L>','<XR><XL>a-</XL>','<XR><XL>-ing</XL><HO>1</HO></XR>.</ET>',
             'vb.','See','In','<L>OE.</L>','used','it','see','this','not','<PS>prep.</PS><HO>1</HO></XR>','has','a','so','early','s']

And this is what I am using to remove those words:

first_word = re.sub(r'\b'+exclusionList[a]+'\b', '',first_word)

where first word is a string read from a text file. I know this is going to be simple but I just do not quite get how to use RE very well.

Thanks

1
  • What's the content of the variable a? Commented Jun 14, 2012 at 21:25

2 Answers 2

3

I can only guess, but probably you want something like this:

pattern = r'\b({})\b'.format('|'.join(map(re.escape, exclusionList)))
first_word = re.sub(pattern, '', first_word)

Note that I'm escaping the words, so they will be matched literally, instead of being interpreted as regular expressions (which they don't seem to be).

Sign up to request clarification or add additional context in comments.

7 Comments

Same idea as mine, but better realized. +1.
@MarkReed Nothing better, only a lot of clarity sacrificed for a really tiny performance improvement.
@BlaXpirit: Don't see what you mean. I put readability before anything else here.
@NiklasB. I'm talking about the superfluous compile, and map to which I prefer generator expressions.
@BlaXpirit: "to which I prefer generator expressions." Well I don't. I prefer map over genexprs in these simple cases. Maybe that's because I like my code to fit into an editor line. Also, the call to re.compile helps separate the concerns here (formulating the pattern vs. matching). Of course it can be left away, shouldn't be a challenge to restructure it to whatever is needed.
|
2

This should do the trick all at once:

exclusionRegex = r'\b(' + '|'.join(re.escape(word) for word in exclusionList) + r')\b'
first_word = re.sub(exclusionRegex, '', first_word)

EDIT: This is my test script:

import re

exclusionList = ['\+','of','<ET>f.','to','the','<L>L.</L>','f.','in','and','see','a','<L>Fr.</L>','as','<ET>ad.','<ET>a.','<PS>v.</PS></XR>',
             'from','<CF>ab</CF>','or','n.','<L>OFr.</L>','pple.','away','was','with','off,','pa.','on','is','cf.','stem','ad.','which',
             'by','action','ppl.','Cf.','but','<L>Gr.</L>','be','after','=','The','form','for','an','<XR><RX>prec.</RX></XR>',
             '<PS>a.</PS></XR>','<L>Eng.</L>','<PS>pref.</PS>','also','L.</L>','<XR><XL>a-</XL>','<XR><XL>-ing</XL><HO>1</HO></XR>.</ET>',
             'vb.','See','In','<L>OE.</L>','used','it','see','this','not','<PS>prep.</PS><HO>1</HO></XR>','has','a','so','early','s']

exclusionRegex = r'\b(' + '|'.join(re.escape(word) for word in exclusionList) + r')\b'
first_word = 'This is a test of the regex'
print re.sub(exclusionRegex, '', first_word)

And this is the output:

This test regex

3 Comments

Yup, indeed. Thanks, Niklas and BlaXpirit.
Thanks Mark,I am getting a syntax error at re.sub. Any suggestions
@EnglishGrad - re.sub is an expression, not a statement; you have to assign it to something or otherwise use it. See my edit.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.