2

I have an AWS S3 bucket filled with data parameterized by date. I'd like to extract that data one date at a time using the AWS CLI (reference), specifically the aws s3 sync command.

The following command does what I expect it to do:

aws s3 sync s3://my-bucket-1 . --exclude "*" --include "*2018-01-17*" --dryrun

Running this command from my command line generates a (dryrun) download for every file in my bucket containing the substring 2018-01-17.

Great! To simplify the necessary file operations, I've written a small CLI wrapper around this executor. This wrapper is in Python, and uses the subprocess.run facility to do its work. The entire operation boils down to the following call:

subprocess.run(['aws', 's3', 'sync', 's3://my-bucket-1', '.', '--exclude', '"*"', '--include', '"*2018-01-17*"', '--dryrun'])

The problem is that when I run this statement, I get a (dryrun) download back for every file in the bucket. That is, data is returned that corresponds with bucket entries from 01-18, 01-19, and so on. The --exclude/--include rules fail to apply, and the result is the same as if I had simply run aws s3 sync s3://my-bucket-1 .

Why does this occur?

3
  • 1
    I think when using a list, it's unnecessary to quote your arguments. IE '"*2018-01-17*"' perhaps should be '*2018-01-17*'. See this question which describes a solution that uses unquoted arguments in a list where quotes would otherwise be used in the string version of the command. Commented Jan 20, 2018 at 17:27
  • A quick test confirms that this is the correct answer. I'm mystified why though. Commented Jan 20, 2018 at 17:30
  • 1
    It's a design decision that was made. I guess the idea is that Python will do the right thing for you. Suppose you pass variables into commands that may or may not need quoting. In the string-version, quoting helps identify that the contained string within the double-quotes is part of the same argument. When you pass arguments in a list, it's already clear what's what... So the assumption is that if you have a " in the part of the command, it is to be interpreted literally. Hope that makes sense. Commented Jan 20, 2018 at 17:33

1 Answer 1

6

When using the list form of invocation, you should not use those additional double quotes. Normally, when your command is given as a single string, quotes can be identify that the contents between the double quotes is all part of a single argument.

If you use double quotes like that inside of a list item, it's understood that it should be parsed to pass the quote literally as an argument, so it is escaping your quote and passing it literally. Consequently, nothing matches your include and exclude parameters because the argument contains a literal ".

So, the following should be the corrected arguments.

subprocess.run(['aws', 's3', 'sync', 's3://my-bucket-1', '.', '--exclude', '*', '--include', '*2018-01-17*', '--dryrun'])
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.