print the unique values in every column in a pandas dataframe

Question

I have a dataframe (df) and want to print the unique values from each column in the dataframe.

I need to substitute the variable (i) [column name] into the print statement

column_list = df.columns.values.tolist()
for column_name in column_list:
    print(df."[column_name]".unique()

Update

When I use this: I get "Unexpected EOF Parsing" with no extra details.

column_list = sorted_data.columns.values.tolist()
for column_name in column_list:
      print(sorted_data[column_name].unique()

What is the difference between your syntax YS-L (above) and the below:

for column_name in sorted_data:
      print(column_name)
      s = sorted_data[column_name].unique()
      for i in s:
        print(str(i))

You are missing a closing parenthesis in your print statement, that's what causes the error. — Marius
– Marius, Commented Dec 2, 2014 at 3:33

Max Ghenis · Accepted Answer · 2020-02-05 02:38:14Z

135

It can be written more concisely like this:

for col in df:
    print(df[col].unique())

Generally, you can access a column of the DataFrame through indexing using the [] operator (e.g. df['col']), or through attribute (e.g. df.col).

Attribute accessing makes the code a bit more concise when the target column name is known beforehand, but has several caveats -- for example, it does not work when the column name is not a valid Python identifier (e.g. df.123), or clashes with the built-in DataFrame attribute (e.g. df.index). On the other hand, the [] notation should always work.

edited Feb 5, 2020 at 2:38

Max Ghenis

16k17 gold badges94 silver badges142 bronze badges

answered Dec 2, 2014 at 3:25

YS-L

14.8k4 gold badges52 silver badges62 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

yoshiserry Over a year ago

Ah fantastic, so when referencing a variable which is part of a datagrams you don't include the dot before df.col-name when the column is a variable?!

Marius Over a year ago

@yoshiserry: Generally you either access a column using dot notation: df.my_col, or indexing notation, with the column name as a string: df['my_col']. You seem to be mixing up the two a bit.

yoshiserry Over a year ago

Thanks Marius, when I use indexing notation I am getting a syntax error (see edit).

yoshiserry Over a year ago

YS-L I get a syntax error when I write this, which just says "invalid syntax" and no specifics. However does this mean the same principle could be used to split a large dataframe into multiple smaller dataframes (one for every month of the year?). month = df.month.unique().tolist() for item in month: [item] = df[df[month]==[item]]

YS-L Over a year ago

For the error, please add it into your question so we may help. For the second part, you are probably looking for something like groupby (please open another question for that if necessary instead of discussing here :-).

|

Rahul Mandre · Accepted Answer · 2020-11-28 18:35:14Z

28

Most upvoted answer is a loop solution, hence adding a one line solution using pandas apply() method and lambda function.

print(df.apply(lambda col: col.unique()))

edited Nov 28, 2020 at 18:35

answered Nov 12, 2020 at 12:33

Rahul Mandre

3893 silver badges4 bronze badges

4 Comments

Philipp HB Over a year ago

The question is asking for the unique values, not the number of unique values, so just a matter of changing the applied function: print(df.apply(lambda col: col.unique()))

TokyoToo Over a year ago

when I do this I get the error message arrays must be the same length. All columns have the same amount or rows so why would I get that message?

antike Over a year ago

And to get the values printed out a bit nicer (at least in my opinion) could add something like df.apply(lambda col: ', '.join(map(str, col.unique())))

Mark Puchala II Jan 23 at 8:38

Absolute game-changer with this, as it solved my issue of needing a way to make the result more readable. I was looking up how to tabulate it because I didn't know this'd just take care of things.

shoaib sipai · Accepted Answer · 2021-08-17 08:00:31Z

13

This will get the unique values in proper format:

pd.Series({col:df[col].unique() for col in df})

answered Aug 17, 2021 at 8:00

shoaib sipai

1571 gold badge3 silver badges12 bronze badges

1 Comment

metinsenturk Over a year ago

Short and precise.

mgoldwasser · Accepted Answer · 2019-01-09 16:47:11Z

6

We can make this even more concise:

df.describe(include='all').loc['unique', :]

Pandas describe gives a few key statistics about each column, but we can just grab the 'unique' statistic and leave it at that.

Note that this will give a unique count of NaN for numeric columns - if you want to include those columns as well, you can do something like this:

df.astype('object').describe(include='all').loc['unique', :]

answered Jan 9, 2019 at 16:47

mgoldwasser

15.6k16 gold badges86 silver badges107 bronze badges

1 Comment

antike Over a year ago

This also gives the number of unique values, not the unique values themselves. Btw, the num of unique values is even easier to get like df.nunique()

A.Kot · Accepted Answer · 2017-06-01 19:36:36Z

5

If you're trying to create multiple separate dataframes as mentioned in your comments, create a dictionary of dataframes:

df_dict = dict(zip([i for i in df.columns] , [pd.DataFrame(df[i].unique(), columns=[i]) for i in df.columns]))

Then you can access any dataframe easily using the name of the column:

df_dict[column name]

answered Jun 1, 2017 at 19:36

A.Kot

7,9832 gold badges24 silver badges24 bronze badges

Comments

user15590289 · Accepted Answer · 2021-12-28 07:37:28Z

4

I was seeking for a solution to this problem as well, and the code below proved to be more helpful in my situation,

for col in df:
    print(col)
    print(df[col].unique())
    print('\n')

It gives something like below:

Fuel_Type
['Diesel' 'Petrol' 'CNG']


HP
[ 90 192  69 110  97  71 116  98  86  72 107  73]


Met_Color
[1 0]

answered Dec 28, 2021 at 7:37

user15590289

1 Comment

daniness Over a year ago

this was helpful in my case as well. Thanks!

Simon Lo · Accepted Answer · 2019-05-28 16:02:14Z

2

The code below could provide you a list of unique values for each field, I find it very useful when you want to take a deeper look at the data frame:

for col in list(df):
    print(col)
    print(df[col].unique())

You can also sort the unique values if you want them to be sorted:

import numpy as np
for col in list(df):
    print(col)
    print(np.sort(df[col].unique()))

answered May 28, 2019 at 16:02

Simon Lo

213 bronze badges

Comments

bhavin · Accepted Answer · 2018-08-16 17:14:01Z

1

cu = []
i = []
for cn in card.columns[:7]:
    cu.append(card[cn].unique())
    i.append(cn)

pd.DataFrame( cu, index=i).T

answered Aug 16, 2018 at 17:14

bhavin

1207 bronze badges

Comments

Tomer Shetah · Accepted Answer · 2021-01-10 09:07:36Z

1

Simply do this:

for i in df.columns:
    print(df[i].unique())

edited Jan 10, 2021 at 9:07

Tomer Shetah

8,5597 gold badges31 silver badges40 bronze badges

answered Jan 10, 2021 at 8:13

Ashish Saini

3184 silver badges11 bronze badges

Comments

DapperMoose · Accepted Answer · 2023-06-21 16:26:34Z

1

Use the pandas apply() method and pass a callable, np.unique in this case:

df.apply(np.unique)

Because you are using apply on an entire data frame (vs. individual columns/series), each column becomes the element or x which will receive the function

edited Jun 21, 2023 at 16:26

answered May 3, 2023 at 15:55

DapperMoose

705 bronze badges

Comments

Shanmukh Sain · Accepted Answer · 2019-03-18 14:48:55Z

0

Or in short it can be written as:

for val in df['column_name'].unique():
    print(val)

answered Mar 18, 2019 at 14:48

Shanmukh Sain

114 bronze badges

Comments

Khoush · Accepted Answer · 2021-12-02 11:14:57Z

0

Even better. Here's code to view all the unique values as a dataframe column-wise transposed:

columns=[*df.columns]
unique_values={}
for i in columns:
    unique_values[i]=df[i].unique()
unique=pd.DataFrame(dict([ (k,pd.Series(v)) for k,v in unique_vals.items() ]))
unique.fillna('').T

answered Dec 2, 2021 at 11:14

Khoush

1

Comments

Dylan Hogg · Accepted Answer · 2022-11-07 07:07:34Z

This solution constructs a dataframe of unique values with some stats and gracefully handles any unhashable column types.

Resulting dataframe columns are: col, unique_len, df_len, perc_unique, unique_values

df_len = len(df)
unique_cols_list = []
for col in df:
    try:
        unique_values = df[col].unique()
        unique_len = len(unique_values)
    except TypeError:  # not all cols are hashable
        unique_values = ""
        unique_len = -1
    perc_unique = unique_len*100/df_len
    unique_cols_list.append((col, unique_len, df_len, perc_unique, unique_values))
df_unique_cols = pd.DataFrame(unique_cols_list, columns=["col", "unique_len", "df_len", "perc_unique", "unique_values"])
df_unique_cols = df_unique_cols[df_unique_cols["unique_len"] > 0].sort_values("unique_len", ascending=False)
print(df_unique_cols)

haneulkim · Accepted Answer · 2024-03-25 09:15:28Z

0

Leveraging numpy

np.unique(df.values)

It gives unique elements in dataframe, iterate or just print array.

answered Mar 25, 2024 at 9:15

haneulkim

5,01812 gold badges54 silver badges106 bronze badges

Comments

Khaled Jallouli · Accepted Answer · 2020-10-23 06:39:16Z

-3

The best way to do that:

Series.unique()

For example students.age.unique() the output will be the different values that occurred in the age column of the students' data frame.

To get only the number of how many different values:

Series.nunique()

answered Oct 23, 2020 at 6:39

Khaled Jallouli

259 bronze badges

1 Comment

Philipp HB Over a year ago

This does not answer how to get the unique values from each column

Collectives™ on Stack Overflow

print the unique values in every column in a pandas dataframe

15 Answers 15

6 Comments

4 Comments

1 Comment

1 Comment

Comments

1 Comment

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

15 Answers 15

6 Comments

4 Comments

1 Comment

1 Comment

Comments

1 Comment

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related