1

I have been searching for hours, literally the entire day on how to generate a pivot table in Python. I am very new to python so please bear with me.

What I want is to take a csv file, extract the first column and generate a pivot table using the count or frequency of the numbers in that column, and sort descending

import pandas

import numpy 


from numpy import recfromtxt
a = recfromtxt('1.csv', skiprows=1, usecols=0, delimiter=',')


print a

^ what i get here is a list of the first column [2 2 2 6 7]

What i need is an export of 2 columns

2--3

6--1

7--1

6
  • The Python collections.Counter class makes this very easy if you have access to it (2.7 or later) and don't specifically need your count array to be a numpy array. You can generate one from collections.Counter(np.nditer(a)). If you need numpy output and your data is nonnegative integers, it looks like bincount would be a start: docs.scipy.org/doc/numpy/reference/generated/… Commented Oct 16, 2013 at 22:30
  • 1
    @PeterDeGlopper In numpy, if your items are not nonnegative integers, you would do something like unq, _ = np.unique(a, reverse_index=True); cnts = np.bincount(_), and now unq and cnts are your two columns. Commented Oct 16, 2013 at 22:47
  • @Jaime - return_inverse rather than reverse_index, right? And you might need return_index as well to make it easier to match the counts up against the array entries. Still, that's clever. Commented Oct 16, 2013 at 23:03
  • @PeterDeGlopper Yes, exactly, return_inverse... I have written somewhere around here that that will eventually become a standard feature of np.bincount, because I find myself writing those do lines of code much too often. Commented Oct 16, 2013 at 23:08

1 Answer 1

1

Have you had a look here?

https://pypi.python.org/pypi/pivottable

Otherwise, from you example, you might just use list comprehensions:

>>> l = [2,2,2,6,7]
>>> [(i, l.count(i)) for i in set(l)]

[
    (2,3),
    (6,1),
    (7,1)
]

Or even dictionary comprehensions, depending on what you need:

>>> l = [2,2,2,6,7]
>>> {i:l.count(i) for i in set(l)}

{
    2: 3,
    6: 1,
    7: 1
}

edit (suggestions from @Peter DeGlopper)

Another more efficient way using collections.Counter (read comments below):

>>> from collections import Counter
>>> l = [2,2,2,6,7]
>>> Counter(l)

Counter({2: 3, 6: 1, 7: 1})
Sign up to request clarification or add additional context in comments.

2 Comments

That's going to have quadratic performance with the list length, since count has to traverse the whole list each time - you can do much better using collections.Counter or a defaultdict that accumulates the running total in a single pass of the list.
It's true, though for the sake of simplicity (for a python beginner), if the list is not really huge, I think most of modern machines can handle that without any troubles. If the perf is a critical issue, then indeed, you're perfectly right, there are many way of optimizing this.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.