How do I group strings and their data using Gnuplot?

Question

I'm brand new to Gnuplot and want to be able to graph a huge amount of data that looks like this:

Description violFine state
"Red Light Violation" $75.00 MD
"No Stop/Park Handicap" $502.00 MD
"Red Light Violation" $75.00 MD
"No Stop/Park Handicap" $502.00 MD
"Red Light Violation" $75.00 MD
"Red Light Violation" $75.00 MD
"Red Light Violation" $75.00 VA
"All Other Stopping or Parking Violations" $32.00 MD
"Red Light Violation" $75.00 MD
"Red Light Violation" $75.00 MD

As you can see, the top line is the names of the columns and I have many duplicate string values in the "Description" column. What I want to do is add up all the "violFine" numbers per unique "Description" and plot it with the "Description" on the x-axis and the total of the "violFines" on the y-axis. I've made a graph to illustrate what I'm talking about accessible at this link: https://i.sstatic.net/6qcBG.jpg
(Sorry, I would've made it available on this page if I had enough reputation points).

Any help with going about this would be awesome! Thanks!

mgilson · Accepted Answer · 2013-03-10 00:44:25Z

1

This sort of data processing task isn't well suited for gnuplot. Luckily, gnuplot is happy to let you use other tools to process the data and then pipe the result in. Here, I would use python:

from collections import defaultdict
import csv
import sys

d = defaultdict(list)
with open(sys.argv[1]) as fin:
    next(fin)  #remove the first line which doesn't contain data
    reader = csv.reader(fin,delimiter=' ',quotechar='"')
    for row in reader:
        d[row[0]].append(float(row[1][1:]))

for k,v in d.items():
    print '"{0}"'.format(k),sum(v)

Now in gnuplot, you can plot this as:

plot '< python script.py datafilename' using (column(0)):2:xtic(1) with lines

answered Mar 10, 2013 at 0:44

mgilson

312k70 gold badges656 silver badges722 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

theozh · Accepted Answer · 2023-01-10 13:08:45Z

You can also do it in gnuplot only without external tools.

define a function inList(), which determines if an item is already in the list
create a list of unique items
define a function to get the index (i.e. x-value) of an item in the unique list
sum up the second column (after removing $) for equal x-values via smooth freq
every ::1 is skipping the first (header) line

For gnuplot>=5.0.0 you could also use sum and word() for the function inList(), which, however doesn't work for gnuplot 4.x because word() will ignore matching double quotes, e.g. word('"abc def" ghi',2) will return ghi in gnuplot 5.x, but def" in gnuplot 4.x. Hence, for 4.x there is another approach using strstrt() and adding an index number which will also work for 5.x

Script: (works for gnuplot>=4.6.0, March 2012)

### sum up values depending on keyword
reset

FILE = "SO/SO15316764.dat"

# create list of unique elements
c    = 0
uniq = ''
inList(list,s) = strstrt(list,'"'.s.'"')
stats FILE u (uniq=uniq.(inList(uniq,strcol(1)) ? '' : sprintf('"%s" %d ',strcol(1),c=c+1))) every ::1 nooutput

getIndex(list,s) = (_n=inList(list,s)) ? int(word(list[_n+2+strlen(s):],1)) : 0

set boxwidth 0.8
set style fill solid 0.4
set key noautotitle
set xrange[0.5:c+0.5]

plot FILE u (getIndex(uniq,strcol(1))):(real(strcol(2)[2:])):xtic(1) every ::1 smooth freq w boxes
### end of script

Result:

Collectives™ on Stack Overflow

How do I group strings and their data using Gnuplot?

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related