I have been struggling to figure out the best way to approach this problem for a bash script. I have a command that will check groups of servers for their uptime in minutes. I only want to continue on to the next group of reboots once all of the servers have been up for 5 minutes but also want to verify they haven't been up for over an hour in-case the reboot doesn't take.
I was originally trying to setup a while loop that would keep issuing the command to check uptimes and send the output into an array. I am trying to figure out how you can loop through an array until all elements of that array are greater than 5 and less than. I haven't even been successful in the first check of greater than 5. Is it even possible to continually write to an array and perform arithmetic checks against every value in the array so that all values must be greater than X in a while loop? The number of servers that will be putting their current uptime into the array is varied per group so it won't always be the same number of values in the array.
Is an array even the proper way to do this? I'd provide examples of what I have tried so far but it's a huge mess and I think starting from scratch just asking for input might be best to start with.
Output of the command I am running to pull uptimes looks similar to the following:
1
2
1
4
3
2
Edit
Due to the help provided I was able to get a functional proof of concept together for this and I'm stoked. Here it is in case it might help anyone trying to do something similar in the future. The problem at hand was that we utilize AWS SSM for all of our Windows server patching and many times when SSM tells servers to reboot after patching the SSM Agent takes ages to check in. This slows our entire process down which right now is fairly manual across dozens of patch groups. Many times we have to go and manually verify a server did indeed reboot after we told it to from SSM so that we know we can start the reboots for the next patch group. With this we will be able to issue a single script that issues reboots for our patch groups in the proper order and verifies that the servers have properly rebooted before continuing on to the next group.
#!/bin/bash
### The purpose of this script is to automate the execution of commands required to reboot groups of AWS Windows servers utilizing SSM while also verifying their uptime and only continuing on to the next group once the previous has reached X # of minutes. This solves the problems of AWS SSM Agents not properly checking in with SSM post-reboot.
patchGroups=(01 02 03) # array containing the values of the RebootGroup tag
for group in "${patchGroups[@]}"
do
printf "Rebooting Patch Group %q\n" "$group"
aws ec2 reboot-instances --instance-ids `aws ec2 describe-instances --filters "Name=tag:RebootGroup,Values=$group" --query 'Reservations[].Instances[].InstanceId' --output text`
sleep 2m
unset passed failed serverList # wipe arrays
declare -A passed failed serverList # declare associative arrays
serverList=$(aws ec2 describe-instances --filter "Name=tag:RebootGroup,Values=$group" --query 'Reservations[*].Instances[*].[InstanceId]' --output text)
for server in ${serverList} # loop through list of servers
do
failed["${server}"]=0 # add to the failed[] array
done
while [[ "${#failed[@]}" -gt 0 ]] # loop while number of servers in the failed[] array is greater than 0
do
for server in "${!failed[@]}" # loop through servers in the failed[] array
do
ssmID=$(aws ssm send-command --document-name "AWS-RunPowerShellScript" --document-version "1" --targets "[{\"Key\":\"InstanceIds\",\"Values\":[\"$server\"]}]" --parameters '{"commands":["$wmi = Get-WmiObject -Class Win32_OperatingSystem ","$uptimeMinutes = ($wmi.ConvertToDateTime($wmi.LocalDateTime)-$wmi.ConvertToDateTime($wmi.LastBootUpTime) | select-object -expandproperty \"TotalMinutes\")","[int]$uptimeMinutes"],"workingDirectory":[""],"executionTimeout":["3600"]}' --timeout-seconds 600 --max-concurrency "50" --max-errors "0" --region us-west-2 --output text --query "Command.CommandId")
sleep 5
uptime=$(aws ssm list-command-invocations --command-id "$ssmID" --details --query 'CommandInvocations[].CommandPlugins[].Output' --output text | sed 's/\r$//')
printf "Checking instance ID %q\n" "$server"
printf "Value of uptime is = %q\n" "$uptime"
# if uptime is within our 'success' window then move server to passed[] array
if [[ "${uptime}" -ge 3 && "${uptime}" -lt 60 ]]
then
passed["${server}"]="${uptime}" # add to passed[] array
printf "Server with instance ID %q has successfully rebooted.\n" "$server"
unset failed["${server}"] # remove from failed[] array
fi
done
# display current status (edit/remove as desired)
printf "\n++++++++++++++ successful reboots\n"
printf "%s\n" "${!passed[@]}" | sort -n
printf "\n++++++++++++++ failed reboot\n"
for server in ${!failed[@]}
do
printf "%s - %s (mins)\n" "${server}" "${failed[${server}]}"
done | sort -n
printf "\n"
sleep 60 # adjust as necessary
done
done
1 2 1 4 3 2), seconds? minutes? how are you managing the list of servers ... stored in an array? stored in a file?; again, how big ishuge messand can you post a minimal version of your code that represents your activities? thinking about this some more ... an associative array where the index is the server name and the value is the latest 'uptime' (uptime[server1]=3(min)); assuming a mainwhile truetype of loop, the inner loop would loop through the array indices/values ... and break out of mainwhileloop when counter=0out-of-rangearray; as they pass the test, remove from theout-of-rangearray and into thein-rangearray; when no more entries in theout-of-rangearray ... move onto next set of servers ...