0

I have a dataframe that hs 2 columns

Text           Categories
"Hi Hello"     [F35, B3, C98]
"Where is"     [G58, F35, C17]
"Is she?!"     [T92, F35, B3]

the field Categories is an array of Categories

I want to find how many distinct Categories I have

I tried this code but did not work

print(len(sorted(set(df['Categories']))))

I tried this but it was just for one record !

print(len(sorted(set(df['Categories'][0]))))

I did not know how to do it for all categories in the dataframe?

2
  • 1
    df['Categories'].explode().value_counts() Commented Jul 12, 2020 at 10:47
  • Are you searching for unique values within each array, or are you searching for unique values for each Category across all arrays? Commented Jul 12, 2020 at 10:48

1 Answer 1

2

This should give you unique categories.

In [128]: df = pd.DataFrame({
     ...:     'Text': ["Hi Hello", "Where is","Is she?!"],
     ...:     'Categories': [["F35", "B3", "C98"],["G58", "F35", "C17"],["G58", "F35", "C17"]]
     ...: })
In [131]: set(df["Categories"].explode())
Out[131]: {'B3', 'C17', 'C98', 'F35', 'G58'}

Credits to @DanielGeffen - You can also use df["Categories"].explode().unique()

Sign up to request clarification or add additional context in comments.

1 Comment

You can also use pandas unique function instead of set: df["Categories"].explode().unique()

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.