For Loop to Return Unique Values in DataFrame

Question

I'm working through a beginner's ML code, and in order to count the number of unique samples in a column, the author uses this code:

def unique_vals(rows, col):
    """Find the unique values for a column in a dataset."""
    return set([row[col] for row in rows])

I am working with a DataFrame however, and for me, this code returns single letters: 'm', 'l', etc. I tried altering it to:

set(row[row[col] for row in rows)

But then it returns:

KeyError: "None of [Index(['Apple', 'Banana', 'Grape'   dtype='object', length=2318)] are in the [columns]"

Thanks for your time!

gmds · Accepted Answer · 2019-04-22 13:49:35Z

5

In general, you don't need to do such things yourself because pandas already does them for you.

In this case, what you want is the unique method, which you can call on a Series directly (the pd.Series is the abstraction that represents, among other things, columns), and which returns a numpy array containing the unique values in that Series.

If you want the unique values for multiple columns, you can do something like this:

which_columns = ... # specify the columns whose unique values you want here

uniques = {col: df[col].unique() for col in which_columns}

edited Apr 22, 2019 at 13:49

answered Apr 22, 2019 at 13:30

gmds

20k4 gold badges37 silver badges65 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

piRSquared Over a year ago

you can also leverage the fact that Numpy operates on an entire array at once and do {*np.unique(df[which_columns].values)}

gmds Over a year ago

@piRSquared I believe that would only work if the columns were homogenous...?

Suhas_Pote · Accepted Answer · 2019-04-22 16:33:18Z

If you are working on categorical columns then following code is very useful

It will not only print the unique values but also print the count of each unique value

col = ['col1', 'col2', 'col3'...., 'coln']

#Print frequency of categories
for col in categorical_columns:
    print ('\nFrequency of Categories for varible %s'%col)
    print (bd1[col].value_counts())

Example:

df

     pets     location     owner
0     cat    San_Diego     Champ
1     dog     New_York       Ron
2     cat     New_York     Brick
3  monkey    San_Diego     Champ
4     dog    San_Diego  Veronica
5     dog     New_York       Ron


categorical_columns = ['pets','owner','location']
#Print frequency of categories
for col in categorical_columns:
    print ('\nFrequency of Categories for varible %s'%col)
    print (df[col].value_counts())

Output:

# Frequency of Categories for varible pets
# dog       3
# cat       2
# monkey    1
# Name: pets, dtype: int64

# Frequency of Categories for varible owner
# Champ       2
# Ron         2
# Brick       1
# Veronica    1
# Name: owner, dtype: int64

# Frequency of Categories for varible location
# New_York     3
# San_Diego    3
# Name: location, dtype: int64

Collectives™ on Stack Overflow

For Loop to Return Unique Values in DataFrame

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related