1

I am trying to make a function that accepts string that looks like function call in python and returns the arguments to the function Example:

"fun(1, bar(x+17, 1), arr = 's,y')"

will result:

["1", "bar(x+17, 1)", "arr = 's,y'"]

The problem of using regular expressions is that I don't know if it is possible to not split at the commas inside parenthesis or quotes. Thanks.

Edit: this Python: splitting a function and arguments doesn't answer correctly the quastions since it doesn't treat commas in parenthesis or quotes.

As @Kevin said, regular expressions cannot solve this since they can't handle nested parenthesis.

6
  • possible duplicate of Regex question about parsing method signature Commented Jul 21, 2015 at 18:36
  • possible duplicate of Regular expression to return text between parenthesis Commented Jul 21, 2015 at 18:36
  • You should use a regular expression to parse the stuff in between parentheses, and then split that that string to find your arguments Commented Jul 21, 2015 at 18:36
  • All these do not treat correctly commas in parenthesis or quotes. like in the example Commented Jul 21, 2015 at 18:37
  • 1
    "vanilla" regexes can't parse nested parentheses. Maybe you can do it with more advanced features, but at some point it's going to be complex enough that you may as well just write a parser. Commented Jul 21, 2015 at 18:43

6 Answers 6

3

you can keep track of your own state fairly simply with something like

def parse_arguments(s):
    openers = "{[\"'("
    closers = "}]\"')"
    state = []
    current = ""
    for c in s:
        if c == "," and not state:
           yield current
           current = ""
        else:
           current += c
           if c in openers:
              state.append(c)
           elif c in closers:
              assert state, "ERROR No Opener for %s"%c
              assert state[-1] == openers[closers.index(c)],"ERROR Mismatched %s %s"%(state[-1],c)
              state.pop(-1)
    assert not state, "ERROR Unexpected End, expected %s"%state[-1]
    yield current

print list(parse_arguments("1, bar(x+17, 1), arr = 's,y'"))
Sign up to request clarification or add additional context in comments.

2 Comments

Doesn't seem to work for parse_arguments("hello"), I think because " is both closing and opening? I think it also will try to match openers/closers within strings, which it shouldn't.
I tried my best at an enhancement: stackoverflow.com/a/61633243/2124834
2

Give a try to this complex split function.

>>> import re
>>> s = "fun(1, bar(x+17, 1), arr = 's,y')"
>>> [i.strip() for i in re.split(r'''^\w+\(|\)$|((?:\([^()]*\)|'[^']*'|"[^"]*"|[^'"(),])*)''', s) if i and i !=',']
['1', 'bar(x+17, 1)', "arr = 's,y'"]

2 Comments

good answer ... but im pretty sure for any regex there exists some counter example that will break it .. (with nested parenthesis) +1 all the same as I cant think up a good counter example offhand (and even if i could) ... but debugging it when it doesnt work might be a little painful
that said I think there is an actual regexp module in pipi import regex that does support stack memory(and other advanced regex features)and so you could come up with a perfect regex solution using that module
1

It would be nice to do the with the ast (abstract syntax tree) standard library module, although it might be overkill:

>>> import ast
>>> parsed = ast.parse("fun(1, bar(x+17, 1), arr='s, y')")
>>> ast.dump(p.body[0].value)
"Call(func=Name(id='fun', ctx=Load()), args=[Num(n=1), 
Call(func=Name(id='bar', ctx=Load()), args=[BinOp(left=Name(id='x', 
ctx=Load()), op=Add(), right=Num(n=17)), Num(n=1)], keywords=[], 
starargs=None, kwargs=None)], keywords=[keyword(arg='arr', 
value=Str(s='s, y'))], starargs=None, kwargs=None)"

Unfortunately there's no standard library way to get those back to standard strings like "1", "bar(x+17, 1)" and "arr='s, y'". But https://pypi.python.org/pypi/astor can probably do that.

Comments

1
import re
x="fun(1, bar(x+17, 1), arr = 's,y')"
print re.split(r",\s*(?![^\(]*\))(?![^']*'(?:[^']*'[^']*')*[^']*$)",re.findall(r"^.*?\((.*)\)",x)[0])

You can try using re.

Output:['1', 'bar(x+17, 1)', "arr = 's,y'"]

1 Comment

as all regex answers, they fail on nested parenthesis. try x = "fun(bar(1,foo(1)))"
0

Based on Joran Beasley's answer with hopefully better string handling? The only change is the new if-arm, allowing any characters when we're in a string, including an escaped quote.

def parse_arguments(s):
    openers = "{[\"'("
    closers = "}]\"')"
    state = []
    current = ""
    for c in s:
        if c == "," and not state:
            yield current
            current = ""
        else:
            current += c
            if state and state[-1] in "\"'":
                if c == state[-1] and current[-1] != "\\":
                    state.pop(-1)
            else:
                if c in openers:
                    state.append(c)
                elif c in closers:
                    assert state, "ERROR No Opener for %s" % c
                    assert (
                        state[-1] == openers[closers.index(c)]
                    ), "ERROR Mismatched %s %s" % (state[-1], c)
                    state.pop(-1)
    assert not state, "ERROR Unexpected End, expected %s" % state[-1]
    yield current

Comments

0

The regex package supports recursive regular expressions so the below (rather nasty) regex gets most of the way there (it doesn't support keyword arguments, possibly something else I've missed)

import regex
arg_cap = r''' *(\w+) *'''
qote_cap = r''' *((?P<qseq>(?P<qch>["'])(?:(?P=qch){2})?).*?(?<!\\)(?P=qseq)) *'''
call_cap = r''' *\w+\((?: *(?&expr)( *,? *(?&expr))*)\)'''
num_cap = r''' *(\d*) *'''
seq_cap = r''' *(?:(?P<sd>\{)|(?P<lst>\[)|(?P<tp>\()) *(?P<kv>(?&expr) *(?:: *(?&expr))? *(?:, *(?&kv) *)*,? *)(?(lst)\])(?(tp)\))(?(sd)\}) *'''
fn_break = fr'''^\w+\((?:(?P<arg>(?P<expr>{arg_cap}|{qote_cap}|{call_cap}|{num_cap}|{seq_cap}))(?:, *(?&arg))*)\)$'''

fn_str = """fun(1, xyz, foo(a, b), (a, b, ((d, e,))), "xbc", '''trip quote''', {set, values}, {abc: def, ghi: jkl})"""
mat = regex.match(fn_break, fn_str)

assert mat is not None
print(mat.captures("arg"))

outputs

['1', 'xyz', 'foo(a, b)', '(a, b, ((d, e,)))', '"xbc"', "'''trip quote'''", '{set, values}', '{abc: def, ghi: jkl}']

That said, you're probably better off using an ast parser such as the standard library ast or pylint's astroid, because I can barely understand the above code, and I wrote it.

(I may have spent too much time writing regular expressions...)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.