9

Is there a graceful way to get names of named %s-like variables of string object? Like this:

string = '%(a)s and %(b)s are friends.'
names = get_names(string)  # ['a', 'b']

Known alternative ways:

  1. Parse names using regular expression, e.g.:

    import re
    names = re.findall(r'%\((\w)\)[sdf]', string)  # ['a', 'b']
    
  2. Use .format()-compatible formating and Formatter().parse(string).

    How to get the variable names from the string for the format() method

But what about a string with %s-like variables?

PS: python 2.7

5
  • 2
    The method you're describing seems to work well. It returns ['a','b']. So what is missing now? Commented Jan 19, 2016 at 13:03
  • @AdiLevin The way no.1 requires additional import. The way no.2 requires another string format. I am just curious is there a way to get the same result using only string object inner methods and properties or, maybe, some string module functions. Commented Jan 19, 2016 at 13:12
  • What is preventing you from using format() for formatting? This seems like one of those cases where it is simply more powerful. Commented Jan 19, 2016 at 13:15
  • 2
    If you're asking, "Does Python, in the course of performing percent-style formatting, ever produce an intermediary data structure that one could inspect and extract the named parameters from?", it does not. The formatting code is all C, so there's no native method you could invoke; and it basically operates directly on the final string object, so there's no intermediary object to look at. Commented Jan 19, 2016 at 13:34
  • The first alternative fails on '%%(a)s'. Commented Jan 19, 2016 at 15:40

4 Answers 4

4

In order to answer this question, you need to define "graceful". Several factors might be worth considering:

  1. Is the code short, easy to remember, easy to write, and self explanatory?
  2. Does it reuse the underlying logic (i.e. follow the DRY principle)?
  3. Does it implement exactly the same parsing logic?

Unfortunately, the "%" formatting for strings is implemented in the C routine "PyString_Format" in stringobject.c. This routine does not provide an API or hooks that allow access to a parsed form of the format string. It simply builds up the result as it is parsing the format string. Thus any solution will need to duplicate the parsing logic from the C routine. This means DRY is not followed and exposes any solution to breaking if a change is made to the formatting specification.

The parsing algorithm in PyString_Format includes a fair bit of complexity, including handling nested parentheses in key names, so cannot be fully implemented using regular expression nor using string "split()". Short of copying the C code from PyString_Format and converting it to Python code, I do not see any remotely easy way of correctly extracting the names of the mapping keys under all circumstances.

So my conclusion is that there is no "graceful" way to obtain the names of the mapping keys for a Python 2.7 "%" format string.

The following code uses a regular expression to provide a partial solution that covers most common usage:

import re
class StringFormattingParser(object):
    __matcher = re.compile(r'(?<!%)%\(([^)]+)\)[-# +0-9.hlL]*[diouxXeEfFgGcrs]')
    @classmethod
    def getKeyNames(klass, formatString):
        return klass.__matcher.findall(formatString)

# Demonstration of use with some sample format strings
for value in [
    '%(a)s and %(b)s are friends.',
    '%%(nomatch)i',
    '%%',
    'Another %(matched)+4.5f%d%% example',
    '(%(should_match(but does not))s',
    ]:
    print StringFormattingParser.getKeyNames(value)

# Note the following prints out "really does match"!
print '%(should_match(but does not))s' % {'should_match(but does not)': 'really does match'}

P.S. DRY = Don't Repeat Yourself (https://en.wikipedia.org/wiki/Don%27t_repeat_yourself)

Sign up to request clarification or add additional context in comments.

Comments

1

Also, you can reduce this %-task to Formater-solution.

>>> import re
>>> from string import Formatter
>>> 
>>> string = '%(a)s and %(b)s are friends.'
>>> 
>>> string = re.sub('((?<!%)%(\((\w)\)s))', '{\g<3>}',  string)
>>> 
>>> tuple(fn[1] for fn in Formatter().parse(string) if fn[1] is not None)
('a', 'b')
>>> 

In this case you can use both variants of formating, I suppose.

The regular expression in it depends on what you want.

>>> re.sub('((?<!%)%(\((\w)\)s))', '{\g<3>}', '%(a)s and %(b)s are %(c)s friends.')
'{a} and {b} are {c} friends.'
>>> re.sub('((?<!%)%(\((\w)\)s))', '{\g<3>}', '%(a)s and %(b)s are %%(c)s friends.')
'{a} and {b} are %%(c)s friends.'
>>> re.sub('((?<!%)%(\((\w)\)s))', '{\g<3>}', '%(a)s and %(b)s are %%%(c)s friends.')
'{a} and {b} are %%%(c)s friends.'

Comments

0

You could also do this:

[y[0] for y in [x.split(')') for x in s.split('%(')] if len(y)>1]

2 Comments

Just like the regex in the question this fails on '%%(a)s'.
What's the exact requirement then? Besides %(a)s, what are the other kinds of expressions we need to be able to parse? %%(a)s? Anything else?
0

Don't know if this qualifies as graceful in your book, but here's a short function that parses out the names. No error checking, so it will fail for malformed format strings.

def get_names(s):
    i = s.find('%')
    while 0 <= i < len(s) - 3:
        if s[i+1] == '(':
            yield(s[i+2:s.find(')', i)])
        i = s.find('%', i+2)

string = 'abd %(one) %%(two) 99 %%%(three)'
list(get_names(string) #=> ['one', 'three']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.