I am trying to cluster data manually using if-else logic on two columns of a data frame and want to create a new column dynamically based on the return value of the function.
How should I pass the data to the following custom function:
def cluster(Data):
if Data.WMCI_range == "Low" and Data.Store_Format == "Small":
return "Low_Small"
elif Data.WMCI_range == "Medium" and Data.Store_Format == "Medium":
return "Medium_Medium"
elif Data.WMCI_range == "High" and Data.Store_Format == "Large":
return "High_large"
elif Data.WMCI_range == "Low" and Data.Store_Format == "Medium":
return "low_Medium"
elif Data.WMCI_range == "Low" and Data.Store_Format == "Large":
return "low_High"
elif Data.WMCI_range == "Medium" and Data.Store_Format == "Small":
return "low_High"
elif Data.WMCI_range == "Medium" and Data.Store_Format == "Large":
return "Medium_Large"
elif Data.WMCI_range == "High" and Data.Store_Format == "Small":
return "High_Small"
elif Data.WMCI_range == "High" and Data.Store_Format == "Medium":
return "High_Medium"
I have tried these three data passing techniques but did not work:
Data['Clusters'] = cluster(Data[['WMCI_range', 'Store_Format']])
Data['Clusters'] = [cluster(i) for i in len(Data[['WMCI_range', 'Store_Format']])]
Please help me find a solution.
You can use this code to mock the data as I did:
columns = ["Small", "Medium", "Large"]
store_Format = random.choices(columns, weights=[6, 8, 5], k=4500)
WMCI = []
for i in range(1, 4500 + 1):
n = random.randint(1, 9)
WMCI.append(n)
df = pd.DataFrame({"Store_Format": store_Format, "WMCI": WMCI})
