92

I have a dataframe (df) and want to print the unique values from each column in the dataframe.

I need to substitute the variable (i) [column name] into the print statement

column_list = df.columns.values.tolist()
for column_name in column_list:
    print(df."[column_name]".unique()

Update

When I use this: I get "Unexpected EOF Parsing" with no extra details.

column_list = sorted_data.columns.values.tolist()
for column_name in column_list:
      print(sorted_data[column_name].unique()

What is the difference between your syntax YS-L (above) and the below:

for column_name in sorted_data:
      print(column_name)
      s = sorted_data[column_name].unique()
      for i in s:
        print(str(i))
1
  • 1
    You are missing a closing parenthesis in your print statement, that's what causes the error. Commented Dec 2, 2014 at 3:33

15 Answers 15

135

It can be written more concisely like this:

for col in df:
    print(df[col].unique())

Generally, you can access a column of the DataFrame through indexing using the [] operator (e.g. df['col']), or through attribute (e.g. df.col).

Attribute accessing makes the code a bit more concise when the target column name is known beforehand, but has several caveats -- for example, it does not work when the column name is not a valid Python identifier (e.g. df.123), or clashes with the built-in DataFrame attribute (e.g. df.index). On the other hand, the [] notation should always work.

Sign up to request clarification or add additional context in comments.

6 Comments

Ah fantastic, so when referencing a variable which is part of a datagrams you don't include the dot before df.col-name when the column is a variable?!
@yoshiserry: Generally you either access a column using dot notation: df.my_col, or indexing notation, with the column name as a string: df['my_col']. You seem to be mixing up the two a bit.
Thanks Marius, when I use indexing notation I am getting a syntax error (see edit).
YS-L I get a syntax error when I write this, which just says "invalid syntax" and no specifics. However does this mean the same principle could be used to split a large dataframe into multiple smaller dataframes (one for every month of the year?). month = df.month.unique().tolist() for item in month: [item] = df[df[month]==[item]]
For the error, please add it into your question so we may help. For the second part, you are probably looking for something like groupby (please open another question for that if necessary instead of discussing here :-).
|
28

Most upvoted answer is a loop solution, hence adding a one line solution using pandas apply() method and lambda function.

print(df.apply(lambda col: col.unique()))

4 Comments

The question is asking for the unique values, not the number of unique values, so just a matter of changing the applied function: print(df.apply(lambda col: col.unique()))
when I do this I get the error message arrays must be the same length. All columns have the same amount or rows so why would I get that message?
And to get the values printed out a bit nicer (at least in my opinion) could add something like df.apply(lambda col: ', '.join(map(str, col.unique())))
Absolute game-changer with this, as it solved my issue of needing a way to make the result more readable. I was looking up how to tabulate it because I didn't know this'd just take care of things.
13

This will get the unique values in proper format:

pd.Series({col:df[col].unique() for col in df})

1 Comment

Short and precise.
6

We can make this even more concise:

df.describe(include='all').loc['unique', :]

Pandas describe gives a few key statistics about each column, but we can just grab the 'unique' statistic and leave it at that.

Note that this will give a unique count of NaN for numeric columns - if you want to include those columns as well, you can do something like this:

df.astype('object').describe(include='all').loc['unique', :]

1 Comment

This also gives the number of unique values, not the unique values themselves. Btw, the num of unique values is even easier to get like df.nunique()
5

If you're trying to create multiple separate dataframes as mentioned in your comments, create a dictionary of dataframes:

df_dict = dict(zip([i for i in df.columns] , [pd.DataFrame(df[i].unique(), columns=[i]) for i in df.columns]))

Then you can access any dataframe easily using the name of the column:

df_dict[column name]

Comments

4

I was seeking for a solution to this problem as well, and the code below proved to be more helpful in my situation,

for col in df:
    print(col)
    print(df[col].unique())
    print('\n')

It gives something like below:

Fuel_Type
['Diesel' 'Petrol' 'CNG']


HP
[ 90 192  69 110  97  71 116  98  86  72 107  73]


Met_Color
[1 0]

1 Comment

this was helpful in my case as well. Thanks!
2

The code below could provide you a list of unique values for each field, I find it very useful when you want to take a deeper look at the data frame:

for col in list(df):
    print(col)
    print(df[col].unique())

You can also sort the unique values if you want them to be sorted:

import numpy as np
for col in list(df):
    print(col)
    print(np.sort(df[col].unique()))

Comments

1
cu = []
i = []
for cn in card.columns[:7]:
    cu.append(card[cn].unique())
    i.append(cn)

pd.DataFrame( cu, index=i).T

Comments

1

Simply do this:

for i in df.columns:
    print(df[i].unique())

Comments

1

Use the pandas apply() method and pass a callable, np.unique in this case:

df.apply(np.unique)

Because you are using apply on an entire data frame (vs. individual columns/series), each column becomes the element or x which will receive the function

Comments

0

Or in short it can be written as:

for val in df['column_name'].unique():
    print(val)

Comments

0

Even better. Here's code to view all the unique values as a dataframe column-wise transposed:

columns=[*df.columns]
unique_values={}
for i in columns:
    unique_values[i]=df[i].unique()
unique=pd.DataFrame(dict([ (k,pd.Series(v)) for k,v in unique_vals.items() ]))
unique.fillna('').T

Comments

0

This solution constructs a dataframe of unique values with some stats and gracefully handles any unhashable column types.

Resulting dataframe columns are: col, unique_len, df_len, perc_unique, unique_values

df_len = len(df)
unique_cols_list = []
for col in df:
    try:
        unique_values = df[col].unique()
        unique_len = len(unique_values)
    except TypeError:  # not all cols are hashable
        unique_values = ""
        unique_len = -1
    perc_unique = unique_len*100/df_len
    unique_cols_list.append((col, unique_len, df_len, perc_unique, unique_values))
df_unique_cols = pd.DataFrame(unique_cols_list, columns=["col", "unique_len", "df_len", "perc_unique", "unique_values"])
df_unique_cols = df_unique_cols[df_unique_cols["unique_len"] > 0].sort_values("unique_len", ascending=False)
print(df_unique_cols)

Comments

0

Leveraging numpy

np.unique(df.values)

It gives unique elements in dataframe, iterate or just print array.

Comments

-3

The best way to do that:

Series.unique()

For example students.age.unique() the output will be the different values that occurred in the age column of the students' data frame.

To get only the number of how many different values:

Series.nunique()

1 Comment

This does not answer how to get the unique values from each column

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.