1

I am trying to write a simple function that will give me a count of unique values from a specific column in pandas df. I would like to use the column name as the function parameter. However,the parameter does not get recognized as string inside the function.

Here is what I am trying to convert to a function where c_type is a column name.

c_type_count = data.groupby('c_type').c_type.count()

Here is the function. I use parameter column to pass the column name:

def uniques(column):
    count = data.groupby(column).column.count()
    print(count)

The groupby(column) part works as indented but the second reference .column stays as .column and I get an error because there is no column by that name in the df.

I understand what is happening there but since I am new to Python I don't necessarily know who to switch the syntax.

0

2 Answers 2

2

I think you're simply looking for value_counts()

data['c_type'].value_counts()

Gives exactly what you describe you're looking for.

Example:

>>> data
  b_type c_type
0      d      b
1      d      a
2      d      a
3      c      a
4      c      a
5      d      b
6      c      a
7      d      b
8      c      b
9      c      a

>>> data['c_type'].value_counts()
a    6
b    4

How to fix your custom function

If you want to keep using your custom function, you just have to use standard indexing rather than attribute indexing, in other words, use square brackets instead of the dot notation to access your column. See the documentation on indexing for more info

def uniques(column):
    count = data.groupby(column)[column].count()
    # Alternatively:
    # count = data.groupby(column).size()
    print(count)

This works as you want:

>>> uniques('c_type')
c_type
a    6
b    4
Sign up to request clarification or add additional context in comments.

5 Comments

Thank you. I am curious to know though how I can make sure that the parameter gets picked as string for .column. Is there a different syntax that i need to use?
Awesome Thank you again.
You can using size , rather than count here
@Wen, true, I'll add that. Is there an advantage to size in this case, though?
Then you can just data.groupby(column).size()
1

This is by design, in your example you are calling the column method of the GroupBy object, python never looks for column value in the current scope. What you are looking for is the built-in function getattr() which will get an object attribute/method by its string name.

def uniques(column):
    count = getattr(data.groupby(column), column).count()
    print(count)

1 Comment

With a Pandas dataframe, the columns are dict items. They're also available as attribute names as a convenience, when they happen to be valid identifier names and don't conflict with standard attributes, but that doesn't mean you should look them up as attributes rather than indices when you don't need that convenience. In other words, don't do getattr(df, colname); just do df[colname].

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.