3

I want to highlight certain words in the data frame. My codes are given below, the problem I am having is that it highlights only the first words from the "selected_ text" such an economy in this case, and not able to highlight other words even though they are present in the text. How can we change it in a way to highlight all words if present in the text? Second, currently, I get only one words under the "existing" column. Can we get more than one word if they are present in the column "body_text"? and highlight them as well.

import pandas as pd
from IPython.display import display, Markdown, Latex, HTML
df =pd.read_csv("/content/df_eup.csv")
df.head(1)

enter image description here

df['body_text'].isnull().sum()
df.dropna(subset=['body_text'], inplace=True)
list_exist = []
selected_words=["economy", "recession", "unemployment", "depression","inflation", "covid19","virus"," bank"]
for index, row in df.iterrows():
    word = selected_words[0]
    i = 0
    while (word not in row['body_text'] and i < 7 ):
        
        i +=1
        word = selected_words[i]
    if i<7:
        list_exist.append(selected_words[i])
    else:
        list_exist.append("not_exist")
df["existing"]=list_exist

def highlight_selected_text(row):
    text = row["body_text"]
    selected_text = ["economy", "recession", "unemployment", "depression","inflation", "covid19","virus","bank"]
    ext = row["existing"]

    color = {
        "economy": "red",
        "recession": "red",
        "unemployment": "red",
        "depression": "red",
        "inflation": "red",
        "covid19": "red",
        "virus" : "red",
        "bank": "red",
        "not_exist": "black"
        
    }[ext]

    highlighted = f'<span style="color: {color}; font-weight: bold">{ext}</span>'
    
    
    
    return text.replace(selected_text[0] or selected_text[1] or selected_text[2] or selected_text[3] or selected_text[4]or selected_text[5]or selected_text[6]or selected_text[7], highlighted)
df["highlighted"] = df.apply(highlight_selected_text, axis=1)


display(HTML(df.sample(30).to_html(escape=False)))

Sample output for the selection of more than words (For the second part of the question) enter image description here

1 Answer 1

3

Try to retrieve dict value inside of f string:

def highlight_selected_text(row):
    text = row["body_text"]
    ext = row["existing"]
    color = {
        "economy": "red",
        "recession": "red",
        "unemployment": "red",
        "depression": "red",
        "inflation": "red",
        "covid19": "red",
        "virus" : "red",
        "bank": "red",
        "not_exist": "black"
    }

    for k, v in color.items():
        text = text.replace(k, f'<span style="color: {v}; font-weight: bold">{k}</span>')

    return text
Sign up to request clarification or add additional context in comments.

8 Comments

Thanks, I updated but now getting this error TypeError: string indices must be integers
Thanks a lot!!, Alexandra, I updated but it did not solve the problem of highlighting
Oh, try also to replace return text.replace(selected_text[0] or selected_text[1] or selected_text[2] or selected_text[3] or selected_text[4]or selected_text[5]or selected_text[6]or selected_text[7], highlighted) with return text.replace(ext, highlighted)
Thank you so much !! It worked. The first issue of highlighting is solved. Can you please help me with the second one? I want to select more than words under the column of "existing" if present in the text. Currently, I get only one word.
Could you give an example of result you want to achieve?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.