Retrieve unique values from a column with lists of values

Question

I have a df that has one column in which the values are lists of values.

My intent is to split this column using some technique from here: Pandas split column of lists into multiple columns

However, for the column names I want to use each unique value from those lists of values.

To retrieve the unique values I have tried three different methods. Each one has failed with a different reason.

Is there a way to get Series.unique() when the values are a list of values?

My three attempts, with associated tracebacks:

1)
unique_vals = splitted_interests.unique()

Traceback (most recent call last):
  File "C:/Users/Mark/PycharmProjects/main/main.py", line 68, in <module>
    unique_vals = splitted_interests.unique()
  File "C:\Users\Mark\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\series.py", line 1991, in unique
    result = super().unique()
  File "C:\Users\Mark\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\base.py", line 1405, in unique
    result = unique1d(values)
  File "C:\Users\Mark\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\algorithms.py", line 405, in unique
    uniques = table.unique(values)
  File "pandas/_libs/hashtable_class_helper.pxi", line 1767, in pandas._libs.hashtable.PyObjectHashTable.unique
  File "pandas/_libs/hashtable_class_helper.pxi", line 1718, in pandas._libs.hashtable.PyObjectHashTable._unique
TypeError: unhashable type: 'list'


2)
unique_vals = splitted_interests.apply(lambda x: x.unique())

Traceback (most recent call last):
  File "C:/Users/Mark/PycharmProjects/main/main.py", line 68, in <module>
    unique_vals = splitted_interests.apply(lambda x: x.unique())
  File "C:\Users\Mark\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\series.py", line 4045, in apply
    mapped = lib.map_infer(values, f, convert=convert_dtype)
  File "pandas/_libs/lib.pyx", line 2228, in pandas._libs.lib.map_infer
  File "C:/Users/Mark/PycharmProjects/main/main.py", line 68, in <lambda>
    unique_vals = splitted_interests.apply(lambda x: x.unique())
AttributeError: 'list' object has no attribute 'unique'

3)
unique_vals = splitted_interests.apply(lambda x: [y.unique() for y in x])

Traceback (most recent call last):
  File "C:\Users\Mark\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\series.py", line 4045, in apply
    mapped = lib.map_infer(values, f, convert=convert_dtype)
  File "pandas/_libs/lib.pyx", line 2228, in pandas._libs.lib.map_infer
  File "C:/Users/Mark/PycharmProjects/main/main.py", line 68, in <lambda>
    unique_vals = splitted_interests.apply(lambda x: [y.unique() for y in x])
  File "C:/Users/Mark/PycharmProjects/main/main.py", line 68, in <listcomp>
    unique_vals = splitted_interests.apply(lambda x: [y.unique() for y in x])
AttributeError: 'str' object has no attribute 'unique'

At run time, the column with lists looks like this:

Babak · Accepted Answer · 2021-07-06 21:56:36Z

3

"To retrieve the unique values I have tried three different methods. Each one has failed with a different reason."

you may wanna try astype('str') to retrieve unique values in a column:

df.<column>.astype('str').unique()

edited Jul 6, 2021 at 21:56

answered Jul 4, 2021 at 21:38

Babak

313 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

smerllo Over a year ago

This answer should get more upvotes

jezrael · Accepted Answer · 2020-03-17 13:42:54Z

2

For same ordering create dictionaries and extract keys, solution working in python 3.6+:

df = pd.DataFrame({'JobRoleInterest':['aa,ss,ss','dd,ff','k,dd,dd,dd', 'j,gg']})
splitted_interests = df['JobRoleInterest'].str.split(',')

unique_vals = list(dict.fromkeys([y for x in splitted_interests for y in x]).keys())
print (unique_vals)
['aa', 'ss', 'dd', 'ff', 'k', 'j', 'gg']

edited Mar 17, 2020 at 13:42

answered Mar 17, 2020 at 11:29

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

6 Comments

MarkS Over a year ago

@jezrael, close, but not really what I was after. In your answers above I still see items duplicated. dd appears in 1 and 2. It also appears twice in 2, which for a set I thought was impossible. All I am after is a list of the unique values so I can create column names from it.

MarkS Over a year ago

No. Just a list of every unique value from all of the lists in the column. I will use that list to create column names. Then I plan to split the one column with list of values into multiple columns (one per unique value) with a single value per one of the techniques in the link I mentioned in my post.

MarkS Over a year ago

@jezrael [aa, ss, dd, ff, k, j, gg] Single list (or set). One of each value.

MarkS Over a year ago

@jezrael If other rows had matching values, I don't want\need them. Just a single entry per value anywhere in the entire column.

MarkS Over a year ago

@jezrael I was at a location where my browser had not refreshed. I had not seen your updated answer. I just ran it and I got what I was after. Thanks!

|

Scott Boston · Accepted Answer · 2020-03-17 18:27:29Z

2

I think you need, pd.Series.unique

Using @jezrael data:

df = pd.DataFrame({'JobRoleInterest':['aa,ss,ss','dd,ff','k,dd,dd,dd', 'j,gg']})

df['JobRoleInterest'].str.split(',', expand=True).stack().unique().tolist()

Output:

['aa', 'ss', 'dd', 'ff', 'k', 'j', 'gg']

Update using list data per @MarkS comments below:

df = pd.DataFrame({'JobRoleInterest':[['aa','ss','ss'],['dd','ff'],['k','dd','dd','dd'],['j','gg']]})
df['JobRoleInterest'].explode().unique().tolist()

Output:

['aa', 'ss', 'dd', 'ff', 'k', 'j', 'gg']

edited Mar 17, 2020 at 18:27

answered Mar 17, 2020 at 13:39

Scott Boston

154k15 gold badges160 silver badges207 bronze badges

2 Comments

MarkS Over a year ago

@ScottBoston I tried yours as well but I am getting back an empty list. Note the data in my lists in the column are not structured the way @jezrael structured his. Look above at the screen shot. Using @jezrael's data, it would be: ['aa', 'ss', 'ss', 'dd', 'ff', 'k', 'dd', 'dd', 'dd', 'j', 'gg']

MarkS Over a year ago

@ScottBoston That worked as well, and it's easier (for me, at least) to understand. Much obliged.

Collectives™ on Stack Overflow

Retrieve unique values from a column with lists of values

3 Answers 3

1 Comment

6 Comments

Update using list data per @MarkS comments below:

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

6 Comments

Update using list data per @MarkS comments below:

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related