0

I have a python function to download csv files from AWS S3 Bucket. The folder I want to download from has a lot of csv files with various naming conventions and out of all those, I want to download files that contain a certain substring.

The files I want to download are named as:

  • BANK_NIFTY_5MINs_2020-01-01.csv
  • BANK_NIFTY_5MINs_2020-01-02.csv
  • BANK_NIFTY_5MINs_2020-01-03.csv and so on.

I do not want to download all the csv files from the folder of 2020, just the csv files that have the substring. Can someone please help on how I can do that?

The below code is where I run the function but this does not download the data:

download_from_s3(s3_uri="s3://dir1/dir2/dir3/2020/BANK_NIFTY_5MINs*.csv", local_dir=os.path.join("2020Data"))

How can I specify the substring of the csv files I want to download?

2
  • 1
    The download_from_s3() method is not offered directly by boto3. Are you using a library of some sort to access S3? (eg Anaconda?) Commented Oct 25, 2022 at 20:31
  • Yes, I have a whole function written using boto3. "download_from_s3" is just the name of the function I've written Commented Oct 25, 2022 at 20:41

1 Answer 1

3

There is no command in Amazon S3 to download objects via a wildcard. At some stage, your code would need to make an API call to S3 with the exact name (Key) of the object you want to download.

Therefore, your code would need to call list_objects_v2() to obtain a listing of objects in the S3 bucket. Then, you can use string comparison logic in Python to determine which objects you want to download, and call download_file() for each of them.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.