Skip to main content
added 136 characters in body
Source Link
Gao
  • 1.2k
  • 9
  • 21
  • A lot of your code does cat $1 | grep Telegram | sed -e 's/"/ /g', which can be simplified to sed '/Telegram/!d; s/"/ /g' "$1", so you may want to save the result somewhere and extract from it when needed.

  • awk '{ print $9 }' | cut -c27-30 can be combined into awk '{print substr($9, 27, 4)}'.

  • In the command substitution that gets assigned to b, you have sed -e 's/;/ /g' | sed -e 's/=/ /g' which should really just be sed -e 's/;/ /g' -e 's/=/ /g' or even better just sed 's/[;=]/ /g'. You don't need the -e option if you're not combining expressions.

  • kk=`wc -l trdata | awk '{ print $1 }'` : not sure why kk=`wc -l < trdata` can'tcan do the job just fine.

  • cat grhostfinal | cut -c1-4 > grhostfinal1 is not as efficient as cut -c1-4 < grhostfinal > grhostfinal1

  •    sqlite3 test2.sqlite "select fecha from testxml4;" > data.csv
    
       cat data.csv | sort | uniq > data2.csv
    

You should wrap your cleanup command in a trap and put it at the beginning of the script so that it will always be executed unless the program is terminated by a SIGKILL:

trap 'rm -f \
rs{host,cname,ctype} \
ttstamp tservice tformat trdata{,2,3,4} \
grhost{,2,final{,{1..4}}} \
data{,2}.csv conjunto.csv \
{a..c} quitar' 'EXIT'
  • You should wrap your cleanup command in a trap and put it at the beginning of the script so that it will always be executed unless the program is terminated by a SIGKILL:

      trap 'rm -f \
      rs{host,cname,ctype} \
      ttstamp tservice tformat trdata{,2,3,4} \
      grhost{,2,final{,{1..4}}} \
      data{,2}.csv conjunto.csv \
      {a..c} quitar' \
      'EXIT'
    
  • Please don't rm -r if you're not deleting directories. This is a dangerous command if you're not careful. I used brace expansion to shorten the list of input files I need to type, but I'm not sure if you really need to create that many temporary files.

  • You don't need to touch the files. Redirection will create the files if they don't exist.

Please don't rm -r if you're not deleting directories. This is a dangerous command if you're not careful. I used brace expansion to shorten the list of input files I need to type, but I'm not sure if you really need to create that many temporary files. I can probably offer more suggestions if I could try out the script. I'll add to this answer if I think of anything, but this should be enough for now.

  • A lot of your code does cat $1 | grep Telegram | sed -e 's/"/ /g', which can be simplified to sed '/Telegram/!d; s/"/ /g' "$1", so you may want to save the result somewhere and extract from it when needed.

  • awk '{ print $9 }' | cut -c27-30 can be combined into awk '{print substr($9, 27, 4)}'.

  • In the command substitution that gets assigned to b, you have sed -e 's/;/ /g' | sed -e 's/=/ /g' which should really just be sed -e 's/;/ /g' -e 's/=/ /g' or even better just sed 's/[;=]/ /g'. You don't need the -e option if you're not combining expressions.

  • kk=`wc -l trdata | awk '{ print $1 }'` : not sure why kk=`wc -l trdata` can't do the job.

  • cat grhostfinal | cut -c1-4 > grhostfinal1 is not as efficient as cut -c1-4 < grhostfinal > grhostfinal1

  •    sqlite3 test2.sqlite "select fecha from testxml4;" > data.csv
    
       cat data.csv | sort | uniq > data2.csv
    

You should wrap your cleanup command in a trap and put it at the beginning of the script so that it will always be executed unless the program is terminated by a SIGKILL:

trap 'rm -f \
rs{host,cname,ctype} \
ttstamp tservice tformat trdata{,2,3,4} \
grhost{,2,final{,{1..4}}} \
data{,2}.csv conjunto.csv \
{a..c} quitar' 'EXIT'

Please don't rm -r if you're not deleting directories. This is a dangerous command if you're not careful. I used brace expansion to shorten the list of input files I need to type, but I'm not sure if you really need to create that many temporary files. I can probably offer more suggestions if I could try out the script. I'll add to this answer if I think of anything, but this should be enough for now.

  • A lot of your code does cat $1 | grep Telegram | sed -e 's/"/ /g', which can be simplified to sed '/Telegram/!d; s/"/ /g' "$1", so you may want to save the result somewhere and extract from it when needed.

  • awk '{ print $9 }' | cut -c27-30 can be combined into awk '{print substr($9, 27, 4)}'.

  • In the command substitution that gets assigned to b, you have sed -e 's/;/ /g' | sed -e 's/=/ /g' which should really just be sed -e 's/;/ /g' -e 's/=/ /g' or even better just sed 's/[;=]/ /g'. You don't need the -e option if you're not combining expressions.

  • kk=`wc -l trdata | awk '{ print $1 }'` : kk=`wc -l < trdata` can do the job just fine.

  • cat grhostfinal | cut -c1-4 > grhostfinal1 is not as efficient as cut -c1-4 < grhostfinal > grhostfinal1

  •    sqlite3 test2.sqlite "select fecha from testxml4;" > data.csv
    
       cat data.csv | sort | uniq > data2.csv
    
  • You should wrap your cleanup command in a trap and put it at the beginning of the script so that it will always be executed unless the program is terminated by a SIGKILL:

      trap 'rm -f \
      rs{host,cname,ctype} \
      ttstamp tservice tformat trdata{,2,3,4} \
      grhost{,2,final{,{1..4}}} \
      data{,2}.csv conjunto.csv \
      {a..c} quitar' \
      'EXIT'
    
  • Please don't rm -r if you're not deleting directories. This is a dangerous command if you're not careful. I used brace expansion to shorten the list of input files I need to type, but I'm not sure if you really need to create that many temporary files.

  • You don't need to touch the files. Redirection will create the files if they don't exist.

I can probably offer more suggestions if I could try out the script. I'll add to this answer if I think of anything, but this should be enough for now.

Source Link
Gao
  • 1.2k
  • 9
  • 21

Take-home message: pipelines in Bash are slow (compared to process substitution); loops in Bash are slow; Bash are slow.

Without a setup script and dummy data to test run it, I don't really understand exactly what your script is trying to achieve at each step, so I can only suggest the following improvements:

Avoid unnecessary pipelines

  • A lot of your code does cat $1 | grep Telegram | sed -e 's/"/ /g', which can be simplified to sed '/Telegram/!d; s/"/ /g' "$1", so you may want to save the result somewhere and extract from it when needed.

  • awk '{ print $9 }' | cut -c27-30 can be combined into awk '{print substr($9, 27, 4)}'.

  • In the command substitution that gets assigned to b, you have sed -e 's/;/ /g' | sed -e 's/=/ /g' which should really just be sed -e 's/;/ /g' -e 's/=/ /g' or even better just sed 's/[;=]/ /g'. You don't need the -e option if you're not combining expressions.

  • kk=`wc -l trdata | awk '{ print $1 }'` : not sure why kk=`wc -l trdata` can't do the job.

  • cat grhostfinal | cut -c1-4 > grhostfinal1 is not as efficient as cut -c1-4 < grhostfinal > grhostfinal1

  •    sqlite3 test2.sqlite "select fecha from testxml4;" > data.csv
    
       cat data.csv | sort | uniq > data2.csv
    

is probably best done entirely in SQL:

    sqlite3 test2.sqlite "SELECT DISTINCT fecha FROM testxml4 ORDER BY fecha;" > data.csv

Avoid loops if possible, or optimize for each iteration of the loop

  •    for i in `seq 1 $kk`
       do
           echo $a >> rsctype
       done
    

is much slower than

    printf "$a\n%.0s" `seq 1 $kk` >> rsctype

for small $kk, and much slower than

    yes "$a" | head -n "$kk" >> rsctype

for large $kk. See https://superuser.com/questions/86340/linux-command-to-repeat-a-string-n-times. (Not sure if you actually want to repeat the same strings, but that's what your code does.) * for k in cat data2.csv do grep "$k" conjunto.csv >> quitar done

    diff quitar conjunto.csv | grep ">" | sed 's/^> //g' > diferencia.csv

looks like it could be done with just one diff? Maybe (haven't tested it):

    diff --changed-group-format='%>' --unchanged-group-format='' data2.csv conjunto.csv\
     > diferencia.csv

Non-performance-related notes

You should wrap your cleanup command in a trap and put it at the beginning of the script so that it will always be executed unless the program is terminated by a SIGKILL:

trap 'rm -f \
rs{host,cname,ctype} \
ttstamp tservice tformat trdata{,2,3,4} \
grhost{,2,final{,{1..4}}} \
data{,2}.csv conjunto.csv \
{a..c} quitar' 'EXIT'

Please don't rm -r if you're not deleting directories. This is a dangerous command if you're not careful. I used brace expansion to shorten the list of input files I need to type, but I'm not sure if you really need to create that many temporary files. I can probably offer more suggestions if I could try out the script. I'll add to this answer if I think of anything, but this should be enough for now.