4

I've been wanting to make the following work so as to have a simple story for executing pandas.DataFrame.someColumnName.unique() function on each column within a pandas.DataFrame.

df.apply(func=unique, axis=0)  # error NameError: name 'unique' is not defined

Is there some trick i'm overlooking to get this working or an alternative solution given the following to do something similar but using type() function on each column in pandas.DataFrame works.

df.apply(func=lambda x: type(x[0]), axis=0)

Note that i have been able to make the following work but doesn't seem to be a way in python to make single line for loops and i find the apply statement a better self documenting implementation.

for col in df.columns: 
    df[col].unique()
2
  • Motivation is when doing exploratory data analysis [eda] on a new dataset I want to not only output types associated with each column but also a listing of the unique values that exist in each column. This will define next steps needed to implement data wrangling code that deals with holes/NaN values and garbage values. Commented Jan 23, 2018 at 19:57
  • 1
    Yeah pandas does not like it when the result of the function passed to apply is a different size for different columns in the dataframe.... which I assume is very likely happening with your data. Commented Jan 23, 2018 at 19:59

2 Answers 2

9

unique is not a registered function in global environment, you can use set for this purpose:

df.apply(set)

Or if using unique, refer it from pandas, also you'd better convert the result to a list, as there is no guarantee all columns contain same number of unique elements:

df.apply(lambda x: pd.unique(x).tolist())
Sign up to request clarification or add additional context in comments.

Comments

3

If you need a one liner loop, you can do:

{e:df[e].unique() for e in df.columns}

1 Comment

This approach prints the df['colname'].unique() output for each column all merged together vs each output separated by a crlf. The "for col in df.columns: df[col].unique()" one liner loop syntax, and two line syntax with the call to execute on each loop indented, both I find require executing any line of python script code following it in order for it to execute the loop. I found I could use the no-op "pass" call to satisfy that need. Is that a well known for loop expected matter?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.