1

Given a data frame:

   Text        Name
0  aa bb cc    Paul
1  ee ff gg hh NA
2  xx yy       NA
3  zz zz zz    Anton

I want to replace only the cells in column "name" where values are "NA" with the first 3 words from the corresponding row in column "text"

Desired output:

   Text        Name
0  aa bb cc    Paul
1  ee ff gg hh ee ff gg
2  xx yy       xx yy
3  zz zz zz    Anton

My attempt failed:

[' '.join(x.split()[:3]) for x in df['Text'] if df.loc[df['Name'] == 'NA']]

5 Answers 5

2

You can split the Text column by then use .str[:3] to access the first three elements

text = df['Text'].str.split(' ').str[:3].str.join(' ')

df['Name'] = df['Name'].mask(df['Name'].isna(), text)
# or
df.loc[df['Name'].isna(), 'Name'] = text
# or
df['Name'] = np.where(df['Name'].isna(), text, df['Name'])
print(df)

          Text      Name
0     aa bb cc      Paul
1  ee ff gg hh  ee ff gg
2        xx yy     xx yy
3     zz zz zz     Anton
Sign up to request clarification or add additional context in comments.

Comments

0

Let us fix your code

df['new'] = [' '.join(x.split()[:3]) if y !=y else y for x, y  in zip(df['Text'],df['Name']) ]
Out[599]: ['Paul', 'ee ff gg', 'xx yy', 'Anton']

Comments

0
df.Name.mask(df.Name.isna(), df.Text.str.split(' ').str[:3].str.join(' '), inplace=True)

Output:

          Text      Name
0     aa bb cc      Paul
1  ee ff gg hh  ee ff gg
2        xx yy     xx yy
3     zz zz zz     Anton

Comments

0

You should split text processing via list comprehension and updating of the dataframe into separate steps:

df = pd.DataFrame({"Text": ["aa bb cc", "ee ff gg hh", "xx yy", "zz zz zz"], 
                   "Name": ["Paul", np.nan, np.nan, "Anton"]})

first_3_words = [" ".join(s.split(" ")[:3]) for s in df[df["Name"].isnull().values]["Text"].values]


df.loc[df["Name"].isnull().values, "Name"] = first_3_words

Comments

0

Splitting and joining is costly, you can use a regex for efficiency:

df['Name'] = df['Name'].fillna(df['Text'].str.extract('((?:\w+ ){,2}\w+)', expand=False))

Output:

          Text      Name
0     aa bb cc      Paul
1  ee ff gg hh  ee ff gg
2        xx yy     xx yy
3     zz zz zz     Anton

Regex:

(            # start capturing
(?:\w+ ){,2} # up to 2 words followed by space
\w+          # one word
)            # end capturing

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.