How to apply if-else custom function on two columns of a dataframe?

Question

My dataframe looks like this:

I am trying to cluster data manually using if-else logic on two columns of a data frame and want to create a new column dynamically based on the return value of the function.

How should I pass the data to the following custom function:

def cluster(Data):
    if Data.WMCI_range == "Low" and Data.Store_Format == "Small":
        return "Low_Small"

    elif Data.WMCI_range == "Medium" and Data.Store_Format == "Medium":
        return "Medium_Medium"

    elif Data.WMCI_range == "High" and Data.Store_Format == "Large":
        return "High_large"

    elif Data.WMCI_range == "Low" and Data.Store_Format == "Medium":
        return "low_Medium"

    elif Data.WMCI_range == "Low" and Data.Store_Format == "Large":
        return "low_High"

    elif Data.WMCI_range == "Medium" and Data.Store_Format == "Small":
        return "low_High"

    elif Data.WMCI_range == "Medium" and Data.Store_Format == "Large":
        return "Medium_Large"

    elif Data.WMCI_range == "High" and Data.Store_Format == "Small":
        return "High_Small"

    elif Data.WMCI_range == "High" and Data.Store_Format == "Medium":
        return "High_Medium"

I have tried these three data passing techniques but did not work:

Data['Clusters'] = cluster(Data[['WMCI_range', 'Store_Format']])
Data['Clusters'] = [cluster(i) for i in len(Data[['WMCI_range', 'Store_Format']])]

Please help me find a solution.

You can use this code to mock the data as I did:

columns = ["Small", "Medium", "Large"]
store_Format = random.choices(columns, weights=[6, 8, 5], k=4500)
WMCI = []
for i in range(1, 4500 + 1):
    n = random.randint(1, 9)
    WMCI.append(n)

df = pd.DataFrame({"Store_Format": store_Format, "WMCI": WMCI})

Laurent · Accepted Answer · 2021-12-05 11:01:46Z

So, given the following dataframe:

columns = ["Small", "Medium", "Large"]
store_Format = random.choices(columns, weights=[6, 8, 5], k=4500)
WMCI = []
for i in range(1, 4500 + 1):
    n = random.randint(1, 9)
    WMCI.append(n)

df = pd.DataFrame({"WMCI": WMCI, "Store_Format": store_Format})
print(df)
# Outputs
      WMCI Store_Format
0        6       Medium
1        1        Large
2        6       Medium
...    ...          ...
4497     6       Medium
4498     1       Medium
4499     7       Medium

Instead of using a custom helper function, I would suggest a much easier and efficient way to achieve the computation of clusters:

df.loc[df["WMCI"] <= 3, "Clusters"] = (
    "Small_" + df.loc[df["WMCI"] <= 3, "Store_Format"]
)

df.loc[(df["WMCI"] > 3) & (df["WMCI"] <= 6), "Clusters"] = (
    "Medium_" + df.loc[(df["WMCI"] > 3) & (df["WMCI"] <= 6), "Store_Format"]
)

df.loc[(df["WMCI"] > 6) & (df["WMCI"] <= 9), "Clusters"] = (
    "High_" + df.loc[(df["WMCI"] > 6) & (df["WMCI"] <= 9), "Store_Format"]
)

Which gives you:

print(df)
# Outputs
      WMCI Store_Format       Clusters
0        6       Medium  Medium_Medium
1        1        Large    Small_Large
2        6       Medium  Medium_Medium
3        7       Medium    High_Medium
4        9        Large     High_Large
...    ...          ...            ...
4495     9        Large     High_Large
4496     3        Small    Small_Small
4497     6       Medium  Medium_Medium
4498     1       Medium   Small_Medium
4499     7       Medium    High_Medium

Collectives™ on Stack Overflow

How to apply if-else custom function on two columns of a dataframe?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related