2

I'm brand new to Gnuplot and want to be able to graph a huge amount of data that looks like this:

Description violFine state
"Red Light Violation" $75.00 MD
"No Stop/Park Handicap" $502.00 MD
"Red Light Violation" $75.00 MD
"No Stop/Park Handicap" $502.00 MD
"Red Light Violation" $75.00 MD
"Red Light Violation" $75.00 MD
"Red Light Violation" $75.00 VA
"All Other Stopping or Parking Violations" $32.00 MD
"Red Light Violation" $75.00 MD
"Red Light Violation" $75.00 MD

As you can see, the top line is the names of the columns and I have many duplicate string values in the "Description" column. What I want to do is add up all the "violFine" numbers per unique "Description" and plot it with the "Description" on the x-axis and the total of the "violFines" on the y-axis. I've made a graph to illustrate what I'm talking about accessible at this link: https://i.sstatic.net/6qcBG.jpg
(Sorry, I would've made it available on this page if I had enough reputation points).

Any help with going about this would be awesome! Thanks!

2 Answers 2

1

This sort of data processing task isn't well suited for gnuplot. Luckily, gnuplot is happy to let you use other tools to process the data and then pipe the result in. Here, I would use python:

from collections import defaultdict
import csv
import sys

d = defaultdict(list)
with open(sys.argv[1]) as fin:
    next(fin)  #remove the first line which doesn't contain data
    reader = csv.reader(fin,delimiter=' ',quotechar='"')
    for row in reader:
        d[row[0]].append(float(row[1][1:]))

for k,v in d.items():
    print '"{0}"'.format(k),sum(v)

Now in gnuplot, you can plot this as:

plot '< python script.py datafilename' using (column(0)):2:xtic(1) with lines
Sign up to request clarification or add additional context in comments.

Comments

0

You can also do it in gnuplot only without external tools.

  • define a function inList(), which determines if an item is already in the list
  • create a list of unique items
  • define a function to get the index (i.e. x-value) of an item in the unique list
  • sum up the second column (after removing $) for equal x-values via smooth freq
  • every ::1 is skipping the first (header) line

For gnuplot>=5.0.0 you could also use sum and word() for the function inList(), which, however doesn't work for gnuplot 4.x because word() will ignore matching double quotes, e.g. word('"abc def" ghi',2) will return ghi in gnuplot 5.x, but def" in gnuplot 4.x. Hence, for 4.x there is another approach using strstrt() and adding an index number which will also work for 5.x

Script: (works for gnuplot>=4.6.0, March 2012)

### sum up values depending on keyword
reset

FILE = "SO/SO15316764.dat"

# create list of unique elements
c    = 0
uniq = ''
inList(list,s) = strstrt(list,'"'.s.'"')
stats FILE u (uniq=uniq.(inList(uniq,strcol(1)) ? '' : sprintf('"%s" %d ',strcol(1),c=c+1))) every ::1 nooutput

getIndex(list,s) = (_n=inList(list,s)) ? int(word(list[_n+2+strlen(s):],1)) : 0

set boxwidth 0.8
set style fill solid 0.4
set key noautotitle
set xrange[0.5:c+0.5]

plot FILE u (getIndex(uniq,strcol(1))):(real(strcol(2)[2:])):xtic(1) every ::1 smooth freq w boxes
### end of script

Result:

enter image description here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.