I have a pandas dataframe like this,
>>> data = {
'hotel_code': [1, 1, 1, 1, 1],
'feed': [1, 1, 1, 1, 2],
'price_euro': [100, 200, 250, 120, 130],
'client_nationality': ['fr', 'us', 'ru,de', 'gb', 'cn,us,br,il,fr,gb,de,ie,pk,pl']
}
>>> df = pd.DataFrame(data)
>>> df
hotel_code feed price_euro client_nationality
0 1 1 100 fr
1 1 1 200 us
2 1 1 250 ru,de
3 1 1 120 gb
4 1 2 130 cn,us,br,il,fr,gb,de,ie,pk,pl
And here is expected output,
>>> data = {
'hotel_code': [1, 1],
'feed': [1, 2],
'cluster1': ['fr', 'cn,us,br,il,fr,gb,de,ie,pk,pl'],
'cluster2': ['us', np.nan],
'cluster3': ['ru,de', np.nan],
'cluster4': ['gb', np.nan],
}
>>> df = pd.DataFrame(data)
>>> df
hotel_code feed cluster1 cluster2 cluster3 cluster4
0 1 1 fr us ru,de gb
1 1 2 cn,us,br,il,fr,gb,de,ie,pk,pl NaN NaN NaN
I want to create cluster columns by unique hotel_code and feed but I have no idea. Cluster numbers are changeable. Any idea? Thanks in advance.
ru,client_nationalityforhotel_code=1andfeed=2, it would beruincluster2for this row.