0

I have a file with the following input data:

       IN   OUT
data1  2.3  1.3
data2  0.1  2.1
data3  1.5  2.8
dataX  ...  ...

There are thousands of such files and each has the same data1, data2, data3, ..., dataX I'd like to count number of each value for each data and column from all files. Example:

In file 'data1-IN' (filename)

2.3 - 50    (times)
0.1 - 233   (times)
... - ...   (times)

In file 'data1-OUT' (filename)

2.1 - 1024 (times)
2.8 - 120  (times)
... - ...  (times)

In file 'data2-IN' (filename)

0.4 - 312    (times)
0.3 - 202   (times)
... - ...   (times)

In file 'data2-OUT' (filename)

1.1 - 124 (times)
3.8 - 451  (times)
... - ...  (times)

In file 'data3-IN' ...

Which Python data structure would be the best to count such data ? I wanted to use multidimensional dictionary but I am struggling with KeyErrors etc.

2 Answers 2

3

You really want to use collections.Counter, perhaps contained in a collections.defaultdict:

import collections
import csv

counts = collections.defaultdict(collections.Counter)

for filename in files:
    for line in csv.reader(open(filename, 'rb')):
         counts[filename + '-IN' ][line[1]] += 1
         counts[filename + '-OUT'][line[2]] += 1
Sign up to request clarification or add additional context in comments.

6 Comments

Python 2.6.4 (r264:75706, Apr 2 2012, 20:24:27) [C] on sunos5 Type "help", "copyright", "credits" or "license" for more information. >>> import collections >>> counts = collections.defaultdict(collections.Counter) Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'module' object has no attribute 'Counter'
From stackoverflow.com/questions/5079790/… : dictionary = collections.defaultdict(lambda: collections.defaultdict(int)) What is a difference between these two definitions ?
And this docs.python.org/library/collections.html says that defaultdict is available since 2.5
@przemol: A Counter offers more functionality than a defaultdict with an int value, such as retrieving the top counts, and combining multiple counters in various ways. Read the linked documentation for more details.
which version of python should I have to be able to use collections.Counter ?
|
1

I have recently started using the Pandas data frame. It has a CSV reader and makes slicing and dicing data very simple.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.