0

I have a column with list of string in every row(no of strings are different). I have created few categories based on the strings in columns and now i want to check if category is available i will place one for the category.

list cusine_type i am using is

['north indian','chinese','south indian','continental','cafe','fast food','beverages','italian','american','desserts','rest_cuisines']

I have written a code, which is basically 2 forloops supported by few if loop to support the logic but this code is quite slow. i need some solution which is less time consuming.

for i in temp.index:
    split = temp['cuisines'].iloc[i].split(',')
    for string in split:
        string=string.strip()
        if string in cusine_type:

            if temp.loc[i,string]==0:

                temp.loc[i,string]=1          
        else:
            temp.loc[i,'rest_cusines']=1

I want output to be like this table:

enter image description here

2
  • it would be great if someone could also help to format the output in tabular format in this question Commented Jul 15, 2019 at 14:39
  • We don't even know what your table looks like. In this case, it would be helpful to include a picture of the data. Commented Jul 15, 2019 at 14:56

1 Answer 1

1

I believe you need str.get_dummies. For your sample:

new_df = df1.cuisines.str.get_dummies(sep=', ')

gives:

   cafe  chinese  italian  mexican  north indian  south indian  thai
0     0        1        0        0             1             0     0
1     0        1        0        0             1             0     1
2     1        0        1        1             0             0     0
3     0        0        0        0             1             1     0
4     0        0        0        0             1             0     0

To convert merge all rest_cuisines:

# get their names
not_in_list = [col for col in new_df.columns if col not in cuisine_list]

# merge into rest_cuisines:
new_df['rest_cusines'] = new_df[not_in_list].max(1)

If you want the whole list, you can do:

new_df.reindex(cuisine_list, axis=1, fill_value=0)

and then attach to the original dataframe:

df = pd.concat((df, new_df), axis=1)
Sign up to request clarification or add additional context in comments.

2 Comments

well your solution works for me i just need one more thing in the solution. which ever string is not part of cusine_list i just want them as one one columns as 'rest cusines' and any dish comes in which is not part of cusine list should be marked as 1. let me know you can help with this.However appreciate your solution
Hello! i have tested the solution and it worked pretty well. just one thing i updated is before concat i appended the list with rest_cuisines and then concated the dataframe. Thank you so much for the solutions :) cheers :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.