I have a pandas dataframe looking like this:
ner_id art_id ner
0 0 emmanuel macron
1 0 paris
2 0 france
3 1 paris
4 0 france
I would like to change the column 'ner_id'.
For example, paris appears in the article with id 0 and also 1 (see art_id column).
I would like to only change the column ner_id and give a unique id for paris and not a different id.
I want to do this in the column everytime a word is repeating in the column and give the repeating word the same id.
How can I do it ?
Expected output:
ner_id art_id ner
0 0 emmanuel macron
1 0 paris
2 0 france
1 1 paris
2 0 france
I would to give first id of the term everytime a term is being repeated in the next rows.
nerand assign ids by these groups. If your data frame is calleddf, you could trydf['ner_id'] = df.groupby('ner').ngroup().nercolumn itself serve as the unique identifier?