1

I'm trying to write a regular expression match

I'd like to match c99 in files, so long as its not part of a hexadecimal color code for example

  • Do NOT match on #000c99
  • DO match on /mysite.com/c99.php
  • DO match on %20c99.php
  • DO match on c99?hi=moo

Is this even possible with regex?

5
  • yes. use a negative look behind assertion Commented Jul 4, 2012 at 21:40
  • @Joel Cornett - I believe they work for fixed width only do they not Commented Jul 4, 2012 at 21:41
  • The regex module contains a new regular expression implementation which allows variable-length lookbehind. Commented Jul 4, 2012 at 21:59
  • What if the code is part of a URL fragment? /mysite.com/index.php#c99 Commented Jul 4, 2012 at 22:26
  • Also, something about parsing HTML using regular expressions. Commented Jul 4, 2012 at 22:26

3 Answers 3

2

Using the regex module:

>>> rx = regex.compile(r"(?<!#\d{0,3})c99")
>>> rx.findall("#000c99")
[]
>>> rx.findall("/mysite.com/c99.php")
[u'c99']
>>> rx.findall("%20c99.php")
[u'c99']
>>> rx.findall("c99?hi=moo")
[u'c99']
Sign up to request clarification or add additional context in comments.

Comments

1

The most straight forward way would be to match lines with "c99" in them then discard any where the c99 is in a color code:

line = fileHandle.readline()
while (line) :
     if (re.search("c99", line)) :
          if (re.search("#.?.?.?c99", line)) :
               pass
          else :
               # line contains c99 not in a color code
     line = fileHandle.readline()

There's probably a way to do it within a single regex, but this was just the first thing that came to mind.

1 Comment

Thanks, I'm trying to do it regex only so that I can avoid writing a macro language system, it won't be the only pattern its used for but in the same circumstance.
0

use this regex (^([^#].*?)?c99.*?$)

3 Comments

This will fail to match cases where c99 occurs on a line with a hex color code before it (e.g., it will not match "#000FFF followed by c99"). Also it matches the entire line, not the c99 itself.
i expect that all text would be splitted into words, or regex would be hard and long
That's a rather big assumption. Anyway, the regex needn't be too huge even if you want to find c99 in arbitrary text.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.