Generating Pivot Tables in Python - Pandas? Numpy? Xlrd? from csv

Question

I have been searching for hours, literally the entire day on how to generate a pivot table in Python. I am very new to python so please bear with me.

What I want is to take a csv file, extract the first column and generate a pivot table using the count or frequency of the numbers in that column, and sort descending

import pandas

import numpy 


from numpy import recfromtxt
a = recfromtxt('1.csv', skiprows=1, usecols=0, delimiter=',')


print a

^ what i get here is a list of the first column [2 2 2 6 7]

What i need is an export of 2 columns

2--3

6--1

7--1

see these: pandas.pydata.org/pandas-docs/dev/10min.html, pandas.pydata.org/pandas-docs/dev/io.html, pandas.pydata.org/pandas-docs/dev/reshaping.html — Jeff
– Jeff, Commented Oct 16, 2013 at 22:24
The Python collections.Counter class makes this very easy if you have access to it (2.7 or later) and don't specifically need your count array to be a numpy array. You can generate one from collections.Counter(np.nditer(a)). If you need numpy output and your data is nonnegative integers, it looks like bincount would be a start: docs.scipy.org/doc/numpy/reference/generated/… — Peter DeGlopper
– Peter DeGlopper, Commented Oct 16, 2013 at 22:30
@PeterDeGlopper In numpy, if your items are not nonnegative integers, you would do something like unq, _ = np.unique(a, reverse_index=True); cnts = np.bincount(_), and now unq and cnts are your two columns. — Jaime
– Jaime, Commented Oct 16, 2013 at 22:47
@Jaime - return_inverse rather than reverse_index, right? And you might need return_index as well to make it easier to match the counts up against the array entries. Still, that's clever. — Peter DeGlopper
– Peter DeGlopper, Commented Oct 16, 2013 at 23:03
@PeterDeGlopper Yes, exactly, return_inverse... I have written somewhere around here that that will eventually become a standard feature of np.bincount, because I find myself writing those do lines of code much too often. — Jaime
– Jaime, Commented Oct 16, 2013 at 23:08

cblab · Accepted Answer · 2013-10-16 22:49:03Z

1

Have you had a look here?

https://pypi.python.org/pypi/pivottable

Otherwise, from you example, you might just use list comprehensions:

>>> l = [2,2,2,6,7]
>>> [(i, l.count(i)) for i in set(l)]

[
    (2,3),
    (6,1),
    (7,1)
]

Or even dictionary comprehensions, depending on what you need:

>>> l = [2,2,2,6,7]
>>> {i:l.count(i) for i in set(l)}

{
    2: 3,
    6: 1,
    7: 1
}

edit (suggestions from @Peter DeGlopper)

Another more efficient way using collections.Counter (read comments below):

>>> from collections import Counter
>>> l = [2,2,2,6,7]
>>> Counter(l)

Counter({2: 3, 6: 1, 7: 1})

edited Oct 16, 2013 at 22:49

answered Oct 16, 2013 at 22:26

cblab

1,95515 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Peter DeGlopper Over a year ago

That's going to have quadratic performance with the list length, since count has to traverse the whole list each time - you can do much better using collections.Counter or a defaultdict that accumulates the running total in a single pass of the list.

cblab Over a year ago

It's true, though for the sake of simplicity (for a python beginner), if the list is not really huge, I think most of modern machines can handle that without any troubles. If the perf is a critical issue, then indeed, you're perfectly right, there are many way of optimizing this.

Collectives™ on Stack Overflow

Generating Pivot Tables in Python - Pandas? Numpy? Xlrd? from csv

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related