2

How to apply a regex to a data frame column?

import pandas as pd

df = pd.DataFrame({'col1': ['negative', 'positive', 'neutral', 'neutral', 'positive']})
cdict = {'n.*': -1, 'p.*': 0}
df['col2'] = df['col1'].map(cdict)

print(df.head())

Current output is:

:        col1  col2
: 0  negative   NaN
: 1  positive   NaN
: 2   neutral   NaN
: 3   neutral   NaN
: 4  positive   NaN

But expected results:

:        col1  col2
: 0  negative   -1
: 1  positive   1
: 2   neutral   -1
: 3   neutral   -1
: 4  positive   1
1
  • Note that your dict should be cdict = {'n.*': -1, 'p.*': 1} for your expected output, I assume its a typo Commented Apr 8, 2021 at 17:49

2 Answers 2

4

Instead of using a series.map use series.replace with regex=True

df['col2'] = df['col1'].replace(cdict,regex=True)
Sign up to request clarification or add additional context in comments.

Comments

2

To be honest, you don't need to have a dict for this at all. You can save on some space there.

Use numpy.select with Series.str.startswith:

In [1927]: import numpy as np

In [1928]: conds = [df.col1.str.startswith('n'), df.col1.str.startswith('p')]

In [1929]: choices = [-1, 0]

In [1930]: df['col2'] = np.select(conds, choices)

In [1931]: df
Out[1931]: 
       col1  col2
0  negative    -1
1  positive     0
2   neutral    -1
3   neutral    -1
4  positive     0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.