1

I am trying to extract the file date from the output of a gsutil ls -l command and compare it to other dates. However, the comparison appears to be not working.

lstr=$(gsutil ls -l gs://input-bucket/my_file_PAS_EXTRACT_INBOUND.xml)
prcs_date=$(echo ${lstr} | awk '{print $2}' | awk -F"T" '{print $1}')
date_frmt=$(date -d "$prcs_date" +"%Y-%m-%d")
cmpr_date=$(date -d "2024-05-01" +"%Y-%m-%d")
high_date=$(date -d "2025-07-09" +"%Y-%m-%d")
if [[ "${date_frmt}" -ge "${cmpr_date}" && "${date_frmt}" -lt "${high_date}" ]]
then
    fname=$(echo "$lstr" | awk '{print $3}')
    gsutil cp "${fname}" gs://output-dev/ 
    if [ $? -eq 0 ]
    then
       echo "1 File Successfully Copied to Target Location"
    else
       echo "Error in copying file - ${fname}"
    fi
else 
    echo "Skipping File ${fname} as it falls outside date range"
fi

The code fails to copy the file saying "Skipping File gs://input-bucket/my_file_PAS_EXTRACT_INBOUND.xml as it falls outside date range" even though the file date falls witin the range mentioned in the code above.

The output of the gsutil ls -l command is :

$ gsutil ls -l gs://input-bucket/my_file_PAS_EXTRACT_INBOUND.xml

13461 2024-10-11T07:45:15Z gs://input-bucket/my_file_PAS_EXTRACT_INBOUND.xml
TOTAL: 1 objects, 13461 bytes (13.35 KiB)

Can someone please help where I may be going wrong ?

6
  • 2
    Didn't you want to use date .... +%s for your numeric comparisons -ge and -lt? Commented Jul 12 at 6:47
  • 1
    Paste the script to shellcheck.net to see if any lint issues found could be part of the trouble. Commented Jul 12 at 6:47
  • prcs_date=$(awk -F'[ :T]+' '{print $2}' <<< "$lstr") Commented Jul 12 at 6:52
  • 2
    The original problem is that your dates are in a format like "2024-05-01", but the -ge and -le comparisons compare numbers, not general strings. When they get something like "2024-05-01", they'll treat it as an arithmetic expression (2024 minus 5 minus 1, which comes out to 2018), which isn't what you want at all. Commented Jul 12 at 9:38
  • 1
    The script is comparing UTC dates (2024-10-11T07:45:15Z) against local timezone dates, is that ok? Commented Jul 12 at 15:33

2 Answers 2

2

The failure you're asking about is because -ge and -lt are used for integer comparison but you're trying to compare 2 non-numeric date strings.

Don't try to parse the output of ls anyway, though (see https://mywiki.wooledge.org/ParsingLs or https://unix.stackexchange.com/questions/128985/… or just google it - it's a famous anti-pattern). If you want to get the last modification time of a file use stat or similar. It'll also be simpler to use seconds since the epoch for the timestamps to compare rather than YYYY-MM-DD strings.

Given that, you could write your script as something like (untested):

#!/usr/bin/env bash

file_path='gs://input-bucket/my_file_PAS_EXTRACT_INBOUND.xml'
beg_date='2024-05-01'
end_date='2025-07-09'

file_ts=$( stat -c '%Y' "$file_path" )
beg_ts=$( date -d "$beg_date" +'%s' )
end_ts=$( date -d "$end_date" +'%s' )

if (( (beg_ts <= file_ts) && (file_ts < end_ts) )); then
    if gsutil cp -- "$file_path" 'gs://output-dev/'; then
        printf '1 File Successfully Copied to Target Location\n' >&2
    else
        printf 'Error in copying file - %s\n' "$file_path" >&2
    fi
else
    printf 'Skipping File %s as it falls outside date range\n' "$file_path" >&2
fi

By the way, when you find your self piping sed/grep/awk to sed/grep/awk that's also an anti-pattern, see https://porkmail.org/era/unix/award#grep, and there's almost always a better way. For example, echo ${lstr} | awk '{print $2}' | awk -F"T" '{print $1}' could have been written as just awk -F'[ T]' '{print $2}' <<< "${lstr}"

Sign up to request clarification or add additional context in comments.

1 Comment

That hits all points quite well in about as succinct a manner as could be done. The UUoC link is a keeper. Thanks Ed!
0

Removing dashes from dates makes the script work as expected

date_frmt=$(date -d "$prcs_date" +"%Y%m%d")

Alternative answer

Sorting dates should give the compared date in second line, if second line is not equal to the log message then the test fails as expected

# use the provided log as if the command was run
lstr='13461 2024-10-11T07:45:15Z gs://input-bucket/my_file_PAS_EXTRACT_INBOUND.xml TOTAL: 1 objects, 13461 bytes (13.35 KiB)'


prcs_date=$(echo ${lstr} | awk '{print $2}' | awk -F"T" '{print $1}')
date_frmt=$(date -d "$prcs_date" +"%Y-%m-%d")
cmpr_date=$(date -d "2024-05-01" +"%Y-%m-%d")
high_date=$(date -d "2025-07-09" +"%Y-%m-%d")

# sort dates, keep the second line
cmp_res=$(printf "%s\n" "${date_frmt}" "${cmpr_date}" "${high_date}" | sort -g | sed -n '2{p;q}')
if [[ "${date_frmt}" == "${cmp_res}" ]]
then
    echo "$lstr" | awk '{print $3}'
    #gsutil cp "${fname}" gs://output-dev/
    if [ $? -eq 0 ]
    then
       echo "1 File Successfully Copied to Target Location"
    else
       echo "Error in copying file - ${fname}"
    fi
else
    echo "Skipping File ${fname} as it falls outside date range"
fi

Result

gs://input-bucket/my_file_PAS_EXTRACT_INBOUND.xml
1 File Successfully Copied to Target Location

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.