Save output from 'stats' command in gnuplot

Question

I want to statistically analyse outputfiles from a benchmark that runs on 600 nodes. In particular, I need the min, upper quartile, median, lower quartile, min and mean values. My output are the files testrun16-[1-600]

with the code:

ListofFiles = system('dir testrun16-*')

set print 'MaxValues.dat'
do for [file in ListofFiles]{
stats file using 1 nooutput
print STATS_max
}

set print 'upquValues.dat'
do for [file in ListofFiles]{
stats file using 1 nooutput
print STATS_up_quartile
}

set print 'MedianValues.dat'
do for [file in ListofFiles]{
stats file using 1 nooutput
print STATS_median
}

set print 'loquValues.dat'
do for [file in ListofFiles]{
stats file using 1 nooutput
print STATS_lo_quartile
}

set print 'MinValues.dat'
do for [file in ListofFiles]{
stats file using 1 nooutput
print STATS_min
}

set print 'MeanValues.dat'
do for [file in ListofFiles]{
stats file using 1 nooutput
print STATS_mean
}

unset print
set term x11
set title 'CLAIX2016 distribution of OSnoise using FWQ'
set xlabel "Number of Nodes"
set ylabel "Runtime [ns]"
plot 'MaxValues.dat' using 1 title 'maximum value', 'upquValues.dat' title 'upper quartile', 'MedianValues.dat' using 1 title 'median value', 'loquValues.dat' title 'lower quartile', 'MinValues.dat' title 'minimum value', 'MeanValues.dat' using 1 title 'mean value';
set term png
set output 'noises.png'
replot

I gain these values and can plot them. However, the tuples from each run get mixed up. The mean of testrun16-17.dat is plotted on x=317, it's min is also at another place.

How can I save the output but keep the tuples together and plot each node on it's actual place?

Does dir testrun16-* give filenames in the order you want? I.e., is testrun16-17.dat the 17th output of that command? — user8153
– user8153, Commented May 6, 2019 at 21:53
I just tested it by also adding another sorting option dir testrun16-* -v sorts them like I want in the console output at least. However gunplot keeps putting the 17th file at place 317 — Luke
– Luke, Commented May 7, 2019 at 10:13
apparently I can not edit comments? Anyhow. I also renamed files with numbers smaller 10 to have the format testrun16-001.dat etc. This now pushes the 17th entry at place 65. — Luke
– Luke, Commented May 7, 2019 at 10:34

theozh · Accepted Answer · 2019-05-08 11:28:44Z

1

Windows (and Linux?) might have some special way to sort (or unsort) data in a directory list. To eliminate this uncertainty you can loop your files by number. However, this assumes that all numbers from 1 to maximum (=FilesCount, in your case 600) actually exist. You tagged Linux, sorry, but I only know Windows and the command to get a list of only the filenames in Windows is 'dir /B testrun16-*'.

Is there a special reason why you write the statistic numbers in 7 different files? Why not into one file?

Something like this: (modified after OP comment)

### batch statistics
reset session

FileRootName = 'testrun16'
FileList = system('dir /B '.FileRootName.'-*')
FilesCount =  words(FileList)
print "Files found: ", FilesCount

# function for extracting the number from the filename 
GetFileNumber(s) = int(s[strstrt(s,"-")+1:strstrt(s,".dat")-1])

set print FileRootName.'_Statistics.dat'
    print "File Max UpQ Med LoQ Min Mean"
    do for [FILE in FileList] {
        stats FILE u 1 nooutput
        print sprintf("%d %g %g %g %g %g %g", \
        GetFileNumber(FILE), \
        STATS_max, STATS_up_quartile, STATS_median, \
        STATS_lo_quartile, STATS_min, STATS_mean)
    }
set print

plot FileRootName.'_Statistics.dat' \
       u 1:2 title 'maximum value', \
    '' u 1:3 title 'upper quartile', \
    '' u 1:4 title 'median value', \
    '' u 1:5 title 'lower quartile', \
    '' u 1:6 title 'minimum value', \
    '' u 1:7 title 'mean value'
### end of code

edited May 8, 2019 at 11:28

answered May 7, 2019 at 18:18

theozh

27.1k6 gold badges42 silver badges92 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Luke Over a year ago

thanks for your reply! Sadly not all numbers exist actually as some nodes did not schedule yet or are unavailable due to maintenance. I created 7 files as my approach overrided the files each time I tried to make a single file, so thanks for fixing that! This also should increase the runtime for the script drastically. I'll try your code with a sorting option when loading the files and without the explicite for loop. Or do you have a better idea? (also upvoted, but it does not show up since my rep limit)

theozh Over a year ago

ok. then you have to extract the number from the filename... see modified code.

Collectives™ on Stack Overflow

Save output from 'stats' command in gnuplot

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related