When cluster_name contains "demo", I want to change it to "unknown".
This is the best I've managed:
df["cluster_name"] = "unknown" if "demo" is in df["cluster_name"] else df["cluster_name"]
But getting:
SyntaxError: invalid syntax
You can use numpy.where:
import numpy as np
df["cluster_name"] = np.where(df["cluster_name"].str.contains("demo"), "unknown", df["cluster_name"])
See below example:
In [814]: df1
Out[814]:
State Year Incident new nn
0 a 1980 513 1 0.0
1 demo is in 1981 453 0 1.0
2 b 1982 424 1 100.0
3 my demo 1983 372 100 NaN
In [816]: df1.State = np.where(df1.State.str.contains('demo'), 'unknown', df1.State)
In [817]: df1
Out[817]:
State Year Incident new nn
0 a 1980 513 1 0.0
1 unknown 1981 453 0 1.0
2 b 1982 424 1 100.0
3 unknown 1983 372 100 NaN
You can use Series.replace if you don't need to search for the substring 'demo'. 'Contains' is ambiguous.
df['cluster_name'] = df['cluster_name'].replace('demo','unknown')
Or replace inplace
df['cluster_name'].replace('demo','unknown', inplace=True)
One potential solution is to use map with a lambda function, which is syntactically similar to what you were trying to do:
Simple map solution:
#replaces the row with 'unknown' if it is 'demo'
df['cluster_name'] = df['cluster_name'].map(lambda x : 'unknown' if x=='demo' else x)
More generalized map solution:
#replaces the row with 'unknown' if contains 'demo'
df['cluster_name'] = df['cluster_name'].map(lambda x : 'unknown' if 'demo' in x else x)
Examples:
>>> #simple map solution
>>> df
cluster_name
0 demo
1 demo
2 1
>>> df['cluster_name'] = df['cluster_name'].map(lambda x : 'unknown' if x=='demo' else x)
>>> df
cluster_name
0 unknown
1 unknown
2 1
>>> #More generalized map solution:
>>> df1
cluster_name
0 demo is a
1 demo
2 1
>>> df1['cluster_name'] = df1['cluster_name'].map(lambda x : 'unknown' if 'demo' in x else x)
>>> df1
cluster_name
0 unknown
1 unknown
2 1