1

I am new to shell scripting. I am interested how to know how to sort a content of a file using shell scripting.

Here is an example:

fap0089-josh.baker
fap00233-adrian.edwards
fap00293-bob.boyle
fap00293-bob.jones
fap002-brian.lopez
fap00293-colby.morris
fap00293-cole.mitchell
psf0354-SKOWALSKI
psf0354-SLEE
psf0382-SLOWE
psf0391-SNOMURA
psf0354-SPATEL
psf0364-SRICHARDS
psf0354-SSEIBERT
psf0354-SSIRAH
bsi0004-STRAN
bsi0894-STURBIC
unit054-SUNDERWOOD

Considering the data above (this is a small set, I have more than 5.5 records), I would like to sort it like this:

  1. Number of entries starting with fap,psf,bsi,unit etc...
  2. The total number of environments for each type, i.e: each numeric after the word, 0004,0382,054 etc are environments. e.g: psf has 4 unique environments.
  3. The sum total
9
  • 1
    "Number of entries starting with fap,psf,bsi,unit etc" meaning the number of lines by each prefix type? "total number of environments for each type" the numbers from the first part broken down by sub-type as well? "The sum total" of what? Do you mean "sorting" or do you mean "analyzing"/"counting"? Have you tried anything? Commented Dec 2, 2014 at 15:23
  • 1
    Given this input, what output are you expecting to see? Commented Dec 2, 2014 at 15:33
  • Do you want to sort the file, or are you looking to simply get the totals you want. Would you consider an awk solution to your problem? awk is really its own programming language, but it's such an old part of Unix that many people consider using awk as a shell solution. Commented Dec 2, 2014 at 15:36
  • Hi @DavidW. Here is the original file : filedropper.com/all-dss-accounts There are 565075 entries in this file. Majority entries are of the format : <app><env>-<user> e.g for app is fap,sbl,unit,jde etc e.g for env is a numeric (maybe 3 digit or 4 digit or 5 digit) I want the results to have the following: a) Total number of entries in the file. b) App name & the number (count) of entries c) App name & the number (count) of unique env. The result needs to be in a tabular format and needs to be stored in a .txt file as well as sent to an email. Commented Dec 3, 2014 at 12:53
  • @EtanReisner Here is the original file : filedropper.com/all-dss-accounts There are 565075 entries in this file. Majority entries are of the format : <app><env>-<user> e.g for app is fap,sbl,unit,jde etc e.g for env is a numeric (maybe 3 digit or 4 digit or 5 digit) I want the results to have the following: a) Total number of entries in the file. b) App name & the number (count) of entries c) App name & the number (count) of unique env. The result needs to be in a tabular format and needs to be stored in a .txt file as well as sent to an email. Commented Dec 3, 2014 at 12:56

1 Answer 1

3

Here's a Schwarzian transform to sort by 1) leading letters, then 2) digits

sed -r 's/^([[:alpha:]]+)([[:digit:]]+)/\1 \2 /' filename | 
sort -t ' ' -k 1,1 -k 2,2n | 
sed 's/ //; s/ //'

output:

bsi0004-STRAN
bsi0894-STURBIC
fap002-brian.lopez
fap0089-josh.baker
fap00233-adrian.edwards
fap00293-bob.boyle
fap00293-bob.jones
fap00293-colby.morris
fap00293-cole.mitchell
psf0354-SKOWALSKI
psf0354-SLEE
psf0354-SPATEL
psf0354-SSEIBERT
psf0354-SSIRAH
psf0364-SRICHARDS
psf0382-SLOWE
psf0391-SNOMURA
unit054-SUNDERWOOD

To generate the metrics you mention, I'd use perl:

perl -nE '
    /^([[:alpha:]]+)(\d+)/ or next;
    $count{$1}++;
    $nenv{$1}{$2}=1;
    $total+=$2
} 
END {
    say "Counts:";
    say "$_ => $count{$_}" for sort keys %count;
    say "Number of environments";
    say "$_ => ", scalar keys %{$nenv{$_}} for sort keys %nenv;
    say "Total = $total";
' filename
Counts:
bsi => 2
fap => 7
psf => 8
unit => 1
Number of environments
bsi => 2
fap => 4
psf => 4
unit => 1
Total = 5355

Without using perl, it's less efficient because you have to read the file multiple times.

echo Counts:
sed 's/[0-9].*//' filename | sort | uniq -c 
echo Number of environments:
sed -r 's/^([a-z]+)([0-9]*).*/\1 \2/' filename | sort -u | cut -d" " -f1 | uniq -c
echo Total:
{ printf "%d+" $(sed -r 's/^[a-z0]+([0-9]*).*/\1/' filename); echo 0; } | bc
Counts:
      2 bsi
      7 fap
      8 psf
      1 unit
Number of environments:
      2 bsi
      4 fap
      4 psf
      1 unit
Total:
5355
Sign up to request clarification or add additional context in comments.

9 Comments

I am bad at perl. Does this perl script work when its put into a shell file under the shell script? Can this be implemented in pure unix code?
There's no such thing as "pure unix code". Shell scripts are merely commands executed in some order, and perl is a command like grep or sed. Nevertheless, I've added a perl-free version
Thanks a lot Glenn. I am using this script to run this file. It throws up error. File : filedropper.com/all-dss-accounts This file contains 565075 records which needs to be sorted.
Moreover, Instead of the number of environments, I need the count of only the unique environments. For e.g: 'fap' has got total of 7 environments but the number of unique environments is 4 (cos 00293 is repeated in 4 entries.
Here is the original file : filedropper.com/all-dss-accounts There are 565075 entries in this file. Majority entries are of the format : <app><env>-<user> e.g for app is fap,sbl,unit,jde etc e.g for env is a numeric (maybe 3 digit or 4 digit or 5 digit) I want the results to have the following: a) Total number of entries in the file. b) App name & the number (count) of entries c) App name & the number (count) of unique env. The result needs to be in a tabular format and needs to be stored in a .txt file as well as sent to an email.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.