0

I want to statistically analyse outputfiles from a benchmark that runs on 600 nodes. In particular, I need the min, upper quartile, median, lower quartile, min and mean values. My output are the files testrun16-[1-600]

with the code:

ListofFiles = system('dir testrun16-*')

set print 'MaxValues.dat'
do for [file in ListofFiles]{
stats file using 1 nooutput
print STATS_max
}

set print 'upquValues.dat'
do for [file in ListofFiles]{
stats file using 1 nooutput
print STATS_up_quartile
}

set print 'MedianValues.dat'
do for [file in ListofFiles]{
stats file using 1 nooutput
print STATS_median
}

set print 'loquValues.dat'
do for [file in ListofFiles]{
stats file using 1 nooutput
print STATS_lo_quartile
}

set print 'MinValues.dat'
do for [file in ListofFiles]{
stats file using 1 nooutput
print STATS_min
}

set print 'MeanValues.dat'
do for [file in ListofFiles]{
stats file using 1 nooutput
print STATS_mean
}

unset print
set term x11
set title 'CLAIX2016 distribution of OSnoise using FWQ'
set xlabel "Number of Nodes"
set ylabel "Runtime [ns]"
plot 'MaxValues.dat' using 1 title 'maximum value', 'upquValues.dat' title 'upper quartile', 'MedianValues.dat' using 1 title 'median value', 'loquValues.dat' title 'lower quartile', 'MinValues.dat' title 'minimum value', 'MeanValues.dat' using 1 title 'mean value';
set term png
set output 'noises.png'
replot

I gain these values and can plot them. However, the tuples from each run get mixed up. The mean of testrun16-17.dat is plotted on x=317, it's min is also at another place.

How can I save the output but keep the tuples together and plot each node on it's actual place?

3
  • Does dir testrun16-* give filenames in the order you want? I.e., is testrun16-17.dat the 17th output of that command? Commented May 6, 2019 at 21:53
  • I just tested it by also adding another sorting option dir testrun16-* -v sorts them like I want in the console output at least. However gunplot keeps putting the 17th file at place 317 Commented May 7, 2019 at 10:13
  • apparently I can not edit comments? Anyhow. I also renamed files with numbers smaller 10 to have the format testrun16-001.dat etc. This now pushes the 17th entry at place 65. Commented May 7, 2019 at 10:34

1 Answer 1

1

Windows (and Linux?) might have some special way to sort (or unsort) data in a directory list. To eliminate this uncertainty you can loop your files by number. However, this assumes that all numbers from 1 to maximum (=FilesCount, in your case 600) actually exist. You tagged Linux, sorry, but I only know Windows and the command to get a list of only the filenames in Windows is 'dir /B testrun16-*'.

Is there a special reason why you write the statistic numbers in 7 different files? Why not into one file?

Something like this: (modified after OP comment)

### batch statistics
reset session

FileRootName = 'testrun16'
FileList = system('dir /B '.FileRootName.'-*')
FilesCount =  words(FileList)
print "Files found: ", FilesCount

# function for extracting the number from the filename 
GetFileNumber(s) = int(s[strstrt(s,"-")+1:strstrt(s,".dat")-1])

set print FileRootName.'_Statistics.dat'
    print "File Max UpQ Med LoQ Min Mean"
    do for [FILE in FileList] {
        stats FILE u 1 nooutput
        print sprintf("%d %g %g %g %g %g %g", \
        GetFileNumber(FILE), \
        STATS_max, STATS_up_quartile, STATS_median, \
        STATS_lo_quartile, STATS_min, STATS_mean)
    }
set print

plot FileRootName.'_Statistics.dat' \
       u 1:2 title 'maximum value', \
    '' u 1:3 title 'upper quartile', \
    '' u 1:4 title 'median value', \
    '' u 1:5 title 'lower quartile', \
    '' u 1:6 title 'minimum value', \
    '' u 1:7 title 'mean value'
### end of code
Sign up to request clarification or add additional context in comments.

2 Comments

thanks for your reply! Sadly not all numbers exist actually as some nodes did not schedule yet or are unavailable due to maintenance. I created 7 files as my approach overrided the files each time I tried to make a single file, so thanks for fixing that! This also should increase the runtime for the script drastically. I'll try your code with a sorting option when loading the files and without the explicite for loop. Or do you have a better idea? (also upvoted, but it does not show up since my rep limit)
ok. then you have to extract the number from the filename... see modified code.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.