Python: count values within defined intervals

Question

I import data from a CSV which looks like this:

3.13
3.51
3.51
4.01
2.13
1.13
1.13
1.13
1.63
1.88

What I would like to do now is to COUNT the values within those intervals: 0-1, 1-2, 2-3, >3

So the result would be

0-1: 0
1-2: 5
2-3: 1
>3: 4

Apart from this main task I would like to calculate the outcome into percent of total numbers (e.g. 0-1: 0%, 1-2: 50%,...)

I am quite new to Python so I got stuck in my attemps solving this thing. Maybe there is a predefined function for solving this I don't know of?

Thanks a lot for your help!!!

+++ UPDATE: +++

Thanks for all the replies. I have testes a bunch of them but I kind of doing something wrong with reading the CSV-File I guess. Refering to the code snippets using a,b,c,d for the differnt intervalls these variables always stay '0' for me.

Here is my actual code:

import csv

a=b=c=0
with open('winter.csv', 'rb') as csvfile:
    spamreader = csv.reader(csvfile, delimiter=',')
    for row in spamreader:
        if row in range(0,1):
            a += 1
        elif row in range (1,2):
            b += 1

print a,b

I also converted all values in the CSV to Integers without success. In the CSV there is just one single column. Any ideas what I am doing wrong???

I'm a little confused on what you're looking for... So you want to go through each number and give it a value based on which range it falls under? If so, what are you doing with these values? — Maxwell Hayes
– Maxwell Hayes, Commented Sep 19, 2014 at 9:11
There is a function called filter and another called len which you should look up. Try them and see if you can find a solution. docs.python.org/2/library/functions.html — ssm
– ssm, Commented Sep 19, 2014 at 9:12
when I print out the variable spamreader I just see this: <_csv.reader object at 0x1004e1130> — user1812071
– user1812071, Commented Sep 19, 2014 at 10:20

mhawke · Accepted Answer · 2014-09-19 12:49:47Z

Here's how to do it in a very concise way with numpy:

import sys
import csv
import numpy as np

with open('winter.csv') as csvfile:
    field = 0    # (zero-based) field/column number containing the required values
    float_list = [float(row[field]) for row in csv.reader(csvfile)]

#float_list = [3.13, 3.51, 3.51, 4.01, 2.13, 1.13, 1.13, 1.13, 1.63, 1.88]

hist, bins = np.histogram(float_list, bins=[0,1,2,3,sys.maxint])
bin_counts = zip(bins, bins[1:], hist)  # [(bin_start, bin_end, count), ... ]

for bin_start, bin_end, count in bin_counts[:-1]:
    print '{}-{}: {}'.format(bin_start, bin_end, count)

# different output required for last bin
bin_start, bin_end, count = bin_counts[-1]
print '>{}: {}'.format(bin_start, count)

Which outputs:

0-1: 0
1-2: 5
2-3: 1
>3: 4

Most of the effort is in massaging the data for output.

It's also quite flexible as it is easy to use different intervals by changing the bins argument to np.histogram(), e.g. add another interval by changing bins:

hist, bins = np.histogram(float_list, bins=[0,1,2,3,4,sys.maxint])

outputs:

0-1: 0
1-2: 5
2-3: 1
3-4: 3
>4: 1

jotrocken · Accepted Answer · 2014-09-19 09:34:12Z

0

This should do, provided the data from the CSV is in values:

from collections import defaultdict

# compute a histogram
histogram = defaultdict(lambda: 0)
interval = 1.
max = 3
for v in values:
    bin = int(v / interval)
    bin = max if bin >= max else bin
    histogram[bin] += 1

# output
sum = sum(histogram.values())
for k, v in sorted(histogram.items()):
    share = 100. * v / sum
    if k >= max:
        print "{}+ : {}, {}%".format(k, v, share)
    else:
        print "{}-{}: {}, {}%".format(k, k+interval, v, share)

edited Sep 19, 2014 at 9:34

answered Sep 19, 2014 at 9:21

jotrocken

2,3333 gold badges28 silver badges38 bronze badges

Comments

Kasravnd · Accepted Answer · 2014-09-19 14:45:31Z

0

import csv
a=b=c=d=0
with open('cf.csv', 'r') as csvfile:
    spamreader = csv.reader(csvfile)
    for row in spamreader:
            if 0<float(row[0])<1:
              a+=1
            elif 1<float(row[0])<2:
              b+=1
            elif 2<float(row[0])<3:
              c+=1
            if 3<float(row[0]):
              d+=1

    print "0-1:{} \n 1-2:{} \n 2-3:{} \n <3:{}".format(a,b,c,d)

out put:

0-1:0 
 1-2:5 
 2-3:1 
 <3:4

Because of your rows are list type we use [0] index to access our data and convert the string to float by float() function .

edited Sep 19, 2014 at 14:45

answered Sep 19, 2014 at 9:16

Kasravnd

108k19 gold badges167 silver badges195 bronze badges

4 Comments

user1812071 Over a year ago

Using this method I am always getting this error: print: "0-1:{} \n 1-2:{} \n 2-3:{} \n <3:{}".format(a,b,c,d) ^ SyntaxError: invalid syntax When I change this to "print a" it is always '0' - same with b,c,d

Kasravnd Over a year ago

yes sorry ,,its a syntax error you must remove : after print i edit the answer !

user1812071 Over a year ago

Seems like it doesn't recognize the value from the CSV. If the CSV value is '1' and I only but a single if-loop like this 'if row>2: a+=1' it increments a by 1 EVEN THOUGH 1 < 2 and therefor shouldn't be counted. Does this code really work for u??

Kasravnd Over a year ago

i edit the answer and i run it my self ! it works fine !

Maxwell Hayes · Accepted Answer · 2014-09-19 18:14:57Z

0

After you get the entries into a list:

0_to_1 = 0
1_to_2 = 0
2_to_3 = 0
ovr_3 = 0

for i in list:
     if i in range(0,1):
          0_to_1 += 1
     elif i in range (1,2):
          1_to_2 += 1

So on and so forth...

And to find the breakdown:

total_values =  0_to_1 + 1_to_2 + 2_to_3 + Ovr_3

perc_0_to_1 = (total_values/0_to_1)*100
perc_1_to_2 = (total_values/1_to_2)*100
perc_2_to_3 = (total_values/2_to_3)*100
perc_ovr_3 =  (total_values/ovr_3)*100

+++++ Response to Update +++++++

import csv

a=b=c=0
with open('winter.csv', 'rb') as csvfile:
    spamreader = csv.reader(csvfile, delimiter=',')
    for row in spamreader:
        for i in row:
            i = float(i.strip()) # .strip() removes blank spaces before converting it to float
            if row in range(0,1):
                a += 1
            elif row in range(1,2):
                b += 1
            # add more elif statements here as desired.

Hope that works. Side note, I like that a=b=c=o thing. Didn't realize you could do that after all this time haha.

edited Sep 19, 2014 at 18:14

answered Sep 19, 2014 at 9:29

Maxwell Hayes

1163 bronze badges

2 Comments

user1812071 Over a year ago

I implemented parts of your code into mine (as it's the easiest to understand for me). But somehow I can't get it to work - pls see my update above. thx!

Maxwell Hayes Over a year ago

See my update. (I like to keep things simple. None of this import 4 things to sort numbers, lol)

Collectives™ on Stack Overflow

Python: count values within defined intervals

4 Answers 4

Comments

Comments

4 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

4 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related