2

I´m quite new with python and pandas. I´m trying to add a new column to a data frame (group column) with values based on a partial string in another column (user column). Users are coded like this: AA1, AA2, BB1, BB2 and so on. What I want is the group column to have a 'AA' value for all the AA users. After looking for a way to do this, I came up with the following line:

df['group'] = ['AA' if x x.startswith('AA') else 'other' for x in df['user']]

Well,it does´t work: 1) I get invalid syntax and line too long error 2) However, it does work if I change x.startswith('AA') for x == 'AA1', so is it something with the startswith part? 3) I don´t know how to add the 'BB' if x x.starts with('BB') in the same line, or should I write a line for each category of user? Thank you so much

2 Answers 2

2
df['group'] = ['AA' if x.startswith('AA') else 'other' for x in df['user']]

you just have an extra x before x.startswith('AA')

Sign up to request clarification or add additional context in comments.

Comments

1

I think you can use numpy.where with str.startswith or str.contains:

import pandas as pd
import numpy as np

df = pd.DataFrame({'user':['AA1','AA2','BB1','BB2']})
print (df)
  user
0  AA1
1  AA2
2  BB1
3  BB2

df['group'] = np.where(df.user.str.startswith('AA'), 'AA', 'other')
df['group1'] = np.where(df.user.str.contains('AA'), 'AA', 'other')
#if need extract first 2 chars from each user
df['g1'] = df.user.str[:2]
print (df)
  user  group group1  g1
0  AA1     AA     AA  AA
1  AA2     AA     AA  AA
2  BB1  other  other  BB
3  BB2  other  other  BB

For extract substring check indexing with str.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.