Skip to main content
edited tags
Link
Jeff Schaller
  • 68.8k
  • 35
  • 122
  • 267
Source Link
Zak
  • 181
  • 1
  • 1
  • 10

Better more efficient bash script (grep)

I have a script that I built .. And it works fantastically -- BUT it's going to take an estimated 4 days to run! I was wondering if there's a more efficient way of doing this.

Here's what the script does:

  1. It gets all files from imageserver and loads them into imageserver.txt
  2. It formats the file paths for grepping
  3. Loops through imageserver.txt and greps /var/www/html per line
  4. Writes to 2 formatted (exist and no) files for later usage
  5. Write to log file for tail to track script progress


What I have is 2 files.

  1. imageserver.txt (about 250,000 lines)

    imageserver/icons/socialmedia/sqcolor_tumblr.png imageserver/icons/socialmedia/sqcolor_gaf.png imageserver/icons/socialmedia/sqcolor_yelp.png imageserver/icons/socialmedia/sqcolor_linkedin.png imageserver/icons/socialmedia/sqcolor_twitter.png imageserver/icons/socialmedia/sqcolor_angies.png imageserver/icons/socialmedia/sqcolor_houzz.png

  2. search.sh

    #!/bin/bash

    echo "\n\n Started ...\n\n"

    Clear Runtime Files

    doesExist.txt nonExists.txt imgSearch.log

    echo "\n\n Building Image List ...\n\n"

    #write contents of imageserver to imageserver.txt find /var/www/imageserver/ -type f > imageserver.txt

    Remove /var/www

    find ./imageserver.txt -type f -readable -writable -exec sed -i "s//var/www///g" {} ; echo "\n\n Finished Building Start Searching ...\n\n"

    linecount=$(wc -l < ./imageserver.txt)

    while IFS= read -r var do echo "$linecount\n\n" echo "\n ... Searching $var\n "

    results=$(grep -rl "$var" /var/www/html) if [ $? -eq 0 ]; then echo "Image exists ...\n" echo "$var|||$results^^^" >> doesExist.txt echo "$linecount | YES | $var " >> imgSearch.log else echo "Image does not exist ... \n" echo $var >> nonExists.txt echo "$linecount | NO | $var " >> imgSearch.log fi

    linecount=$((linecount-1)) done < ./imageserver.txt

    echo "\n\n -- FINISHED -- \n\n"

Basically I am checking to see if the images are used withing ANY of the html within the /var/www/html directory.

With that said .. Each iteration of grep takes about .5 - 1 second. At my calculations that's 3 - 4 days.. While I SUPPOSE that's exceptable .. Is there a better (more efficient) way of accomplishing this?