You need a negative look-around assertions, and add boundary anchors:
r'(?<!\*\*)-?\b\d+\b(?!\*\*)'
The (?<!...) syntax only matches at positions where the text before it doesn't match the pattern. Similarly, the (?!...) syntax does the same for following text. Together they ensure you only match numbers that are not exponents (follow **) and not have an exponent (followed by **).
The \b boundary anchor only matches at the start or end of a string, and anywhere there’s a word character followed by a non-word character or vice versa (so in between \w\W or \W\w, where \w happily includes digits but not arithmetic characters):
>>> import re
>>> input = "+2**5+3+4**8-7"
>>> re.findall(r'(?<!\*\*)-?\b\d+\b(?!\*\*)', input)
['3', '-7']
Note that I used \d to match digits, and removed the + from the pattern, since you don't want that in your expected output.
You can play with the expression in the online regex101 demo; e.g. you can try it with numbers > 10 and using a single * for multiplication.
If you must support negative exponents, then the above won’t suffice as ...**-42 has 42 match without ** preceding the digits. In that case an extra negative look-behind before the -? that disallows **- is needed:
r'(?<!\*\*)-?(?<!\*\*-)\b\d+\b(?!\*\*)'
(Thanks to Casimir eg Hippolyte for points my this out and suggesting a solution for it).
However, at this point I’d suggest you switch to just parsing the expression into an abstract syntax tree and then walking the tree to extract the operands that are not part of an exponent:
import ast
class NumberExtractor(ast.NodeVisitor):
def __init__(self):
self.reset()
def reset(self):
self.numbers = []
def _handle_number(self, node):
if isinstance(node, ast.Constant):
if isinstance(node.value, (int, float, complex)):
return node.value
elif isinstance(node, ast.Num):
return node.n
def visit_UnaryOp(self, node):
if isinstance(node.op, (ast.UAdd, ast.USub)):
operand = self._handle_number(node.operand)
if operand is None:
return
elif isinstance(node.op, UAdd):
self.numbers.append(+operand)
else:
self.numbers.add(-operand)
def visit_Constant(self, node):
if isinstance(node.value, (int, float, complex)):
self.numbers.append(node.value)
def visit_Num(self, node):
self.numbers.append(node.n)
def visit_BinOp(self, node):
if isinstance(node.op, ast.Pow):
return # ignore exponentiation
self.generic_visit(node) # process the rest
def extract(expression):
try:
tree = ast.parse(expression, mode='eval')
except SyntaxError:
return []
extractor = NumberExtractor()
extractor.visit(tree)
return extractor.numbers
This extracts just the numbers; subtraction won’t produce a negative number:
>>> input = "+2**5+3+4**8-7"
>>> extract(input)
[3, 7]
Moreover, it can handle arbitrary amounts of whitespace, and much more complex expressions than a regex could ever handle:
>>> extract("(10 + 15) * 41 ** (11 + 19 * 17) - 42")
[10, 15, 42]