3

I wanna write a script that returns digits with power of 1. User's inputs are quadratic and normal digits. what I want is described below:

input = "+2**5+3+4**8-7"
Output = "3,-7"

I tried regex re.findall(r"[+-]?[0-9]+[^[*][*][2]]", input) but it doesn't work Thanks in advance :)

4 Answers 4

4

You need a negative look-around assertions, and add boundary anchors:

r'(?<!\*\*)-?\b\d+\b(?!\*\*)'

The (?<!...) syntax only matches at positions where the text before it doesn't match the pattern. Similarly, the (?!...) syntax does the same for following text. Together they ensure you only match numbers that are not exponents (follow **) and not have an exponent (followed by **).

The \b boundary anchor only matches at the start or end of a string, and anywhere there’s a word character followed by a non-word character or vice versa (so in between \w\W or \W\w, where \w happily includes digits but not arithmetic characters):

>>> import re
>>> input = "+2**5+3+4**8-7"
>>> re.findall(r'(?<!\*\*)-?\b\d+\b(?!\*\*)', input)
['3', '-7']

Note that I used \d to match digits, and removed the + from the pattern, since you don't want that in your expected output.

You can play with the expression in the online regex101 demo; e.g. you can try it with numbers > 10 and using a single * for multiplication.

If you must support negative exponents, then the above won’t suffice as ...**-42 has 42 match without ** preceding the digits. In that case an extra negative look-behind before the -? that disallows **- is needed:

r'(?<!\*\*)-?(?<!\*\*-)\b\d+\b(?!\*\*)'

(Thanks to Casimir eg Hippolyte for points my this out and suggesting a solution for it).

However, at this point I’d suggest you switch to just parsing the expression into an abstract syntax tree and then walking the tree to extract the operands that are not part of an exponent:

import ast

class NumberExtractor(ast.NodeVisitor):
    def __init__(self):
        self.reset()

    def reset(self):
        self.numbers = []

    def _handle_number(self, node):
        if isinstance(node, ast.Constant):
            if isinstance(node.value, (int, float, complex)):
                return node.value
        elif isinstance(node, ast.Num):
            return node.n

    def visit_UnaryOp(self, node):
        if isinstance(node.op, (ast.UAdd, ast.USub)):
            operand = self._handle_number(node.operand)
            if operand is None:
                return
            elif isinstance(node.op, UAdd):
                self.numbers.append(+operand)
            else:
                self.numbers.add(-operand)

    def visit_Constant(self, node):
        if isinstance(node.value, (int, float, complex)):
            self.numbers.append(node.value)

    def visit_Num(self, node):
        self.numbers.append(node.n)

    def visit_BinOp(self, node):
        if isinstance(node.op, ast.Pow):
            return  # ignore exponentiation
        self.generic_visit(node)  # process the rest

def extract(expression):
    try:
        tree = ast.parse(expression, mode='eval')
    except SyntaxError:
        return []
    extractor = NumberExtractor()
    extractor.visit(tree)
    return extractor.numbers

This extracts just the numbers; subtraction won’t produce a negative number:

>>> input = "+2**5+3+4**8-7"
>>> extract(input)
[3, 7]

Moreover, it can handle arbitrary amounts of whitespace, and much more complex expressions than a regex could ever handle:

>>> extract("(10 + 15) * 41 ** (11 + 19 * 17) - 42")
[10, 15, 42]
Sign up to request clarification or add additional context in comments.

10 Comments

**-12: I'm afraid two lookbehinds are needed for this case (regex101.com/r/qZy32F/1)
@CasimiretHippolyte ah yes, that’s unfortunate. I’ll ponder it for a bit; limited time at the moment.
@CasimiretHippolyte one alternate i think is using alternation, have a pattern to take care about start and end part of string Regex Demo
@CodeManiac using the Python parser and traversing the AST is definitely a better option there :-)
You can also use a simple and stupid solutiion: remove all numbers with exponent first.
|
3
re.findall(r"(?<!\*\*)(?<!\*\*[+-])[+-]?\b[0-9]++(?!\*\*)", input)

(?!\*\*) is a negative lookahead that makes sure we haven't 2 * after digits.

re doesn't support posssessive quantifiers, you have to use PyPi regex

Demo

4 Comments

Python re module doesn't have possessive quantifiers.
@CasimiretHippolyte: I wasn't aware of that, I've just test with regex101.
Note that even with possessive quantifier support, your pattern will match **-12 since the ? isn't possessive too.
@CasimiretHippolyte: Fixed with 2 lookbehind and a word boundary.
2

You could write a parser and check whatever you need. I know it is a bit long, but fun :)

$ cat lexer.py
import re
from collections import namedtuple

tokens = [
    r'(?P<TIMES>\*)',
    r'(?P<POW>(\+|-)?\d+\*\*\d+)',
    r'(?P<NUM>(\+|-)?\d+)'
    ]

master_re = re.compile('|'.join(tokens))
Token = namedtuple('Token', ['type','value'])
def tokenize(text):
    scan = master_re.scanner(text)
    return (Token(m.lastgroup, m.group())
            for m in iter(scan.match, None))

x = '+2**5+3+4**8-7'

required = []
for tok in tokenize(x):
  if tok.type == 'POW':
      coeff, exp = tok.value.split('**')
      if exp == '1':
          required.append(coeff)
  elif tok.type == 'NUM':
      required.append(tok.value)

print(required)

Output:

$ python lexer.py
['+3', '-7']

Comments

0

You can try this simple regex expression

re.findall(r'[-\+]\d(?!\*\*)', search_data)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.