28

I have command line arguments in a string and I need to split it to feed to argparse.ArgumentParser.parse_args.

I see that the documentation uses string.split() plentifully. However in complex cases, this does not work, such as

--foo "spaces in brakets"  --bar escaped\ spaces

Is there a functionality to do that in python?

(A similar question for java was asked here).

2
  • what should be the ouput exactly of argparse.ArgumentParser.parse_args Commented Jul 6, 2017 at 10:07
  • You need to show us a full program that demonstrates the specific problem you are having, with example input that triggers it. Commented Jul 6, 2017 at 10:08

3 Answers 3

35

This is what shlex.split was created for.

Sign up to request clarification or add additional context in comments.

6 Comments

Nice! And it's available since Python 2.3.
Does shlex.split have an issue with escaped quote marks? e.g --foo "bar\"baz"
@user1735003: Yes, though it would usually be the shell handling this for you (shlex follows mostly the same rules as sh shell rules). But if you have a constructed command line like that, it's fine with it, that's the whole point of shlex: shlex.split(r'--foo "bar\"baz"') produces ['--foo', 'bar"baz']. The argparse docs are being lazy when they use str.split instead of shlex.split (or explicit lists); they were going for brevity, but without the mental load of requiring shlex knowledge.
@Eric: Given there is no single Windows format (Windows executables receive the raw string and parse it themselves), and that the question was about parsing a string for argparse to work with (which has a fixed behavior regardless of OS), your comment doesn't seem particularly relevant to this case.
Even though windows receives the raw string, most programs define a main(argc, argv), which end up using the parsing provided by their C runtime. argparse has a fixed behavior regardless of OS, but that's because it takes a list of strings as an input, typically sys.argv. How sys.argv gets populated is platform-dependent, and it's worth drawing attention to that. shlex.split matches the way sys.argv is populated on posix systems, but not how it is populated on windows systems.
|
7

If you're parsing a windows-style command line, then shlex.split doesn't work correctly - calling subprocess functions on the result will not have the same behavior as passing the string directly to the shell.

In that case, the most reliable way to split a string like the command-line arguments to python is... to pass command line arguments to python:

import sys
import subprocess
import shlex
import json  # json is an easy way to send arbitrary ascii-safe lists of strings out of python

def shell_split(cmd):
    """
    Like `shlex.split`, but uses the Windows splitting syntax when run on Windows.

    On windows, this is the inverse of subprocess.list2cmdline
    """
    if os.name == 'posix':
        return shlex.split(cmd)
    else:
        # TODO: write a version of this that doesn't invoke a subprocess
        if not cmd:
            return []
        full_cmd = '{} {}'.format(
            subprocess.list2cmdline([
                sys.executable, '-c',
                'import sys, json; print(json.dumps(sys.argv[1:]))'
            ]), cmd
        )
        ret = subprocess.check_output(full_cmd).decode()
        return json.loads(ret)

One example of how these differ:

# windows does not treat all backslashes as escapes
>>> shell_split(r'C:\Users\me\some_file.txt "file with spaces"', 'file with spaces')
['C:\\Users\\me\\some_file.txt', 'file with spaces']

# posix does
>>> shlex.split(r'C:\Users\me\some_file.txt "file with spaces"')
['C:Usersmesome_file.txt', 'file with spaces']

# non-posix does not mean Windows - this produces extra quotes
>>> shlex.split(r'C:\Users\me\some_file.txt "file with spaces"', posix=False)
['C:\\Users\\me\\some_file.txt', '"file with spaces"']  

Comments

1

You could use the split_arg_string helper function from the click package:

import re

def split_arg_string(string):
    """Given an argument string this attempts to split it into small parts."""
    rv = []
    for match in re.finditer(r"('([^'\\]*(?:\\.[^'\\]*)*)'"
                             r'|"([^"\\]*(?:\\.[^"\\]*)*)"'
                             r'|\S+)\s*', string, re.S):
        arg = match.group().strip()
        if arg[:1] == arg[-1:] and arg[:1] in '"\'':
            arg = arg[1:-1].encode('ascii', 'backslashreplace') \
                .decode('unicode-escape')
        try:
            arg = type(string)(arg)
        except UnicodeError:
            pass
        rv.append(arg)
    return rv

For example:

>>> print split_arg_string('"this is a test" 1 2 "1 \\" 2"')
['this is a test', '1', '2', '1 " 2']

The click package is starting to dominate for command-arguments parsing, but I don't think it supports parsing arguments from string (only from argv). The helper function above is used only for bash completion.

Edit: I can nothing but recommend to use the shlex.split() as suggested in the answer by @ShadowRanger. The only reason I'm not deleting this answer is because it provides a little bit faster splitting then the full-blown pure-python tokenizer used in shlex (around 3.5x faster for the example above, 5.9us vs 20.5us). However, this shouldn't be a reason to prefer it over shlex.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.