How to assign unique value based on two columns combinations in Python?

Question

I have this fossil data and want to create a new column with unique values for each of the unique occurrence in

GENUS = (['Microtherium', 'Bachitherium', 'Coelodonta', ..., 'Murina',
   'Boopsis', None], dtype=object)
SPECIES = (['Microtherium', 'Bachitherium', 'Coelodonta', ..., 'Murina',
   'Boopsis', None], dtype=object)

#dropping the duplicates
dffossil[['GENUS', 'SPECIES']].drop_duplicates

Now I want to have a new column with unique number for each of the the unique GENUS and SPECIES combination.

Do you want a unique number (i.e. integer) for each combination or just a unique identifier. If you want a unique identifier you could easily try hash(Genus_String + SPECIES_str) to create a hash value of each combination within the df. — itprorh66
– itprorh66, Commented Jul 23, 2022 at 14:10
Is this pseudocode, just to show what your columns are? Displaying an actual (small) DataFrame would be more helpful, and make this a minimal reproducible example. (Also, don't forget the parentheses when calling drop_duplicates()...) — CrazyChucky
– CrazyChucky, Commented Jul 23, 2022 at 15:03
Also relevant: Pandas-specific advice for minimal reproducible examples — CrazyChucky
– CrazyChucky, Commented Jul 23, 2022 at 15:06
Does this answer your question? How to create a unique identifier based on multiple columns? — JonasV
– JonasV, Commented Sep 1, 2022 at 13:03

itprorh66 · Accepted Answer · 2022-07-23 14:57:26Z

If you simply want a unique identifier for each combination of GENUS and SPECIES you can do the following:
Note: In have assumed that either GENUS or SPECIES can contain a None value, which complicates the process slightly.

So Given a DF of the form:

    GENUS   SPECIES
0   Murina  Coelodonta
1   Murina  Microtherium
2   Microtherium    Murina
3   Bachitherium    Microtherium
4   Coelodonta  None
5   Coelodonta  Coelodonta
6   Microtherium    Coelodonta
7   Microtherium    Murina
8   Microtherium    Bachitherium
9   Murina  Microtherium

Add a column which uniquely identifies each combination of GENUS and SPECIES. We call this Column 'ID'.

Define a function to create a hash of entries, taking into account the possibility of a None entry.

def hashValues(g, s):
    if g == None:
        g = "None"
    if s == None:
        s = 'None'
    return hash(g + s)

To add the column use the following:

df['ID'] = [hashValues(df['GENUS'].to_list()[i], df['SPECIES'].to_list()[i]) for i in range(df.shape[0])]

which yields:

    GENUS           SPECIES         ID
0   Murina          Coelodonta      -6583287505830614713
1   Murina          Microtherium    6019734726691011903
2   Microtherium    Murina          -2318069015748475190
3   Bachitherium    Microtherium    5795352218934423262
4   Coelodonta      None            4851538573581845777
5   Coelodonta      Coelodonta      -5115794138222494493
6   Microtherium    Coelodonta      2603682196287415014
7   Microtherium    Murina          -2318069015748475190
8   Microtherium    Bachitherium    -2746445536675711990
9   Murina          Microtherium    6019734726691011903

Collectives™ on Stack Overflow

How to assign unique value based on two columns combinations in Python?

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related