4

I would like to use the excellent pyparsing package to parse a python function call in its most general form. I read one post that was somewhat useful here but still not general enough.

I would like to parse the following expression:

f(arg1,arg2,arg3,...,kw1=var1,kw2=var2,kw3=var3,...)

where

  1. arg1,arg2,arg3 ... are any kind of valid python objects (integer, real, list, dict, function, variable name ...)
  2. kw1, kw2, kw3 ... are valid python keyword names
  3. var1,var2,var3 are valid python objects

I was wondering if a grammar could be defined for such a general template. I am perhaps asking too much ... Would you have any idea ?

thank you very much for your help

Eric

4
  • 2
    Sure it's possible, there is a full python grammar implementation in pyparsing examples (though it looks quite cryptic, but might help). Commented Jan 23, 2013 at 9:39
  • 3
    Depending on what you're doing with it - another option is to use Python's standard library to parse it: see ast.parse and related NodeVisitor and NodeTransformer classes. Commented Jan 23, 2013 at 10:32
  • That's not the most general form. The most general form is f(arg1, arg2, arg3, ..., kwarg1=val1, kwarg2=val2, ..., *args, **kwargs). Commented Jan 23, 2013 at 11:45
  • You right Bakuriu, thanks for the comment. Commented Jan 23, 2013 at 14:01

1 Answer 1

8

Is that all? Let's start with a simple informal BNF for this:

func_call ::= identifier '(' func_arg [',' func_arg]... ')'
func_arg ::= named_arg | arg_expr
named_arg ::= identifier '=' arg_expr
arg_expr ::= identifier | real | integer | dict_literal | list_literal | tuple_literal | func_call
identifier ::= (alpha|'_') (alpha|num|'_')*
alpha ::= some letter 'a'..'z' 'A'..'Z'
num ::= some digit '0'..'9'

Translating to pyparsing, work bottom-up:

identifier = Word(alphas+'_', alphanums+'_')

# definitions of real, integer, dict_literal, list_literal, tuple_literal go here
# see further text below

# define a placeholder for func_call - we don't have it yet, but we need it now
func_call = Forward()

string = pp.quotedString | pp.unicodeString

arg_expr = identifier | real | integer | string | dict_literal | list_literal | tuple_literal | func_call

named_arg = identifier + '=' + arg_expr

# to define func_arg, must first see if it is a named_arg
# why do you think this is?
func_arg = named_arg | arg_expr

# now define func_call using '<<' instead of '=', to "inject" the definition 
# into the previously declared Forward
#
# Group each arg to keep its set of tokens separate, otherwise you just get one
# continuous list of parsed strings, which is almost as worthless the original
# string
func_call << identifier + '(' + delimitedList(Group(func_arg)) + ')'

Those arg_expr elements could take a while to work through, but fortunately, you can get them off the pyparsing wiki's Examples page: http://pyparsing.wikispaces.com/file/view/parsePythonValue.py

from parsePythonValue import (integer, real, dictStr as dict_literal, 
                              listStr as list_literal, tupleStr as tuple_literal)

You still might get args passed using *list_of_args or **dict_of_named_args notation. Expand arg_expr to support these:

deref_list = '*' + (identifier | list_literal | tuple_literal)
deref_dict = '**' + (identifier | dict_literal)

arg_expr = identifier | real | integer | dict_literal | list_literal | tuple_literal | func_call | deref_list | deref_dict

Write yourself some test cases now - start simple and work your way up to complicated:

sin(30)
sin(a)
hypot(a,b)
len([1,2,3])
max(*list_of_vals)

Additional argument types that will need to be added to arg_expr (left as further exercise for the OP):

  • indexed arguments : dictval['a'] divmod(10,3)[0] range(10)[::2]

  • object attribute references : a.b.c

  • arithmetic expressions : sin(30), sin(a+2*b)

  • comparison expressions : sin(a+2*b) > 0.5 10 < a < 20

  • boolean expressions : a or b and not (d or c and b)

  • lambda expression : lambda x : sin(x+math.pi/2)

  • list comprehension

  • generator expression

Sign up to request clarification or add additional context in comments.

4 Comments

I tried some of the examples you gave and the defined grammar failed to parse the len([1,2,3]) string. It gives a TypeError: object of type 'int' has no len() because the [1,2,3] list is concatenated to 123 integer. I tried to figure out why. I thought that it may be due to the Optional(Suppress(",")) used for defining listStr, tupleStr and dictStr but removing them still produces the same error.
I don't understand why [1,2,3] concatenates to an integer - listStr should parse and convert that to the 3-element list [1,2,3].
That's what puzzles me too, Paul. If I do:type(eval(func_call.transformString('[1,2,3]'))) I get a <type 'list'> but when I do func_call.transformString('len([1,2,3])') I get len(123) and not len([1,2,3]) as expected. I will try to dig this further. If in the meantime, you could find the problem, you're wellcome !!!
Pyparsing is no longer hosted on wikispaces.com. Go to github.com/pyparsing/pyparsing

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.