1

I have a list of files in the s3 bucket as below. I want to load only the current date files, for example, I want to get all file_a.csv which are under 2020/06/09, similarly get all file_b.csv under same 2020/06/09

I tried out the syntax but this gets me all files from all dates select metadata$filename from @stage/snflk/ts/(file_format=>CSV_SKIP_HEADER,pattern=>'.*/file_a.*[.]csv')

snflk/ts/ts-prod/2020/06/08/file_a.csv
snflk/ts/ts-prod/2020/06/09/file_a.csv
snflk/ts/ts-prod3/2020/06/08/file_a.csv
snflk/ts/ts-prod3/2020/06/09/file_a.csv
snflk/ts/ts-prod2/2020/06/08/file_a.csv
snflk/ts/ts-prod2/2020/06/09/file_a.csv
snflk/ts/ts-prod/2020/06/08/file_b.csv
snflk/ts/ts-prod/2020/06/09/file_b.csv
snflk/ts/ts-prod3/2020/06/08/file_b.csv
snflk/ts/ts-prod3/2020/06/09/file_b.csv
snflk/ts/ts-prod2/2020/06/08/file_b.csv
snflk/ts/ts-prod2/2020/06/09/file_b.csv

1 Answer 1

3

I want to load only the current date files

.*/file_a.*[.]csv

this gets me all files from all dates

The pattern (.*) supplied at the beginning here is a wildcard that will match everything available.

If you need to limit the date to a constant value, specify the constant values inside the regular expression pattern (i.e. ts-prod.*/2020/06/09/file_a\.csv, ts-prod.*/2020/06/09/file_b\.csv, etc.):

select
  metadata$filename
from @stage/snflk/ts/
(pattern=>'ts-prod.*/2020/06/09/file_a\.csv');

To iteratively test and write your regex patterns, you can use web tools such as Regex101, RegExr, etc.

P.s. You can also construct the pattern to use the current date dynamically, if you are trying to automate this:

-- Produces 'ts-prod.*/2020/06/11/file_a\.csv'
SET curr_dt_a=(
  SELECT 
    'ts-prod.*/' || 
    TO_VARCHAR(CURRENT_DATE(), 'YYYY/MM/DD') || 
    '/file_a\\.csv'
)

-- Referenced as a SQL variable in the PATTERN option
SELECT
  metadata$filename
FROM @stage/snflk/ts/
(PATTERN=>$curr_dt_a);
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks, the above regex didn't work out .. Made slight changes and it worked select metadata$filename from @stage/snflk/ts/ (pattern=>'.*/2020/06/09/file_a.csv');
Your detailed explanation is worth a lot. Thanks
This solution only works for a specific date. What if the files to be loaded are between 2 different dates? I just posted this question and wonder if it is possible to do that. stackoverflow.com/questions/65622872/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.