0

My table value looks like

label       title
misc        barbie song 
misc
hello       monster
misc        girls song
misc        barbie doll

string_list = ['barbie','girls']

My expected output

label    new_label
misc     barbie
misc
hello    monster
misc     girls
misc     barbie

I want to work on the label with 'misc' then check if it has video title, if video title is present, I want to check if any of the strings appear in the corresponding video title and either replace misc with it matched string from the list or create a new column called new_label and have them there.

If the misc does not have any video title it should be blank also, any label other than misc should retain their old value

Is this achievable in pandas? This logic is quite tricky for me

1 Answer 1

1

If i understood correctly, you only want to do it for label 'misc' and if label doesn't contain any value of the list strings then should be blank, if that's the case the code below should do the trick:

import pandas as pd

#your data
d = {'label':['misc', 'misc', 'hello', 'misc', 'misc'], 'title':['barbie sogn', '', 'monster', 'girls song', 'barbie doll']}
#create dataframe
df = pd.DataFrame(data = d)
#your strings list
string_list = ['barbie','girls']
#loop over slice of dataframe for label misc
for i, row in df.loc[df['label'] == 'misc'].iterrows():
    #define label and title
    label = df.at[i, 'label']
    title = df.at[i, 'title']
    #loop over list items
    for item in string_list:
        #check if string in list is in label
        if (item in title):
            #if yes then change it to item
            df['title'].loc[df.index == i] = item

#second loop to check if other titles that not in the list           
for i, row in df.loc[df['label'] == 'misc'].iterrows():
    label = df.at[i, 'label']
    title = df.at[i, 'title']
    #if value is not in list then set to blank
    if ((title != string_list[0]) & (title != string_list[1])):
        df['title'].loc[df.index == i] = ''      
Sign up to request clarification or add additional context in comments.

7 Comments

How can I give 'd' as my column names instead of typing them out?
Not sure i know what you mean!! Why do you want to give d your columns names? I did that just to create the dataframe, i presume you already have the dataframe! Is it already read to python? If not, is it in a csv/txt file?
well then you don't need the first two lines of the code as you already have it, those were just to create the dataframe. All you need to do now is just run the rest of the code for your existing dataframe.
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-239-d1da8924b43d> in <module> 37 for item in string_list: 38 #check if string in list is in label ---> 39 if (item in title): 40 #if yes then change it to item 41 combined_report['Video Title'].loc[combined_report.index == i] = item TypeError: argument of type 'float' is not iterable
you don't need to drop them off, just use df = df.fillna(' '), this will replace them with blank, then run your code
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.