0

Input:

ColumnA:
A
A
B
B
C
C

Output

ColumnB:
0
1
0
1
0
1

The condition is: The column B will be 0 if the value in column A is the first time appear. Otherwise the column B will be 1. Thanks! Using pandas in Python.

2
  • So... what have you tried so far? Commented May 31, 2017 at 13:24
  • df.apply(lambda x: int(x.ColumnA in df.iloc[:x.name,0].tolist()), axis=1) Tried this one. But the efficiency is not so good when there is a large data. Commented May 31, 2017 at 15:23

2 Answers 2

1

Use duplicated + astype mask to int:

print (df.duplicated())
0    False
1     True
2     True
3    False
4     True
5    False
6     True
dtype: bool

df['ColumnB'] = df.duplicated().astype(int)
print (df)
  ColumnA  ColumnB
0       A        0
1       A        1
2       A        1
3       B        0
4       B        1
5       C        0
6       C        1
Sign up to request clarification or add additional context in comments.

1 Comment

Your way is very efficient during processing the large data.Thanks!
0
df=pd.DataFrame({'ColumnA': {0: 'A', 1: 'A', 2: 'B', 3: 'B', 4: 'C', 5: 'C'}})

df
Out[284]: 
  ColumnA
0       A
1       A
2       B
3       B
4       C
5       C

Use an apply to check if the value has appeared before.

df['ColumnB'] = df.apply(lambda x: int(x.ColumnA in df.iloc[:x.name,0].tolist()), axis=1)

df
Out[287]: 
  ColumnA  ColumnB
0       A        0
1       A        1
2       B        0
3       B        1
4       C        0
5       C        1

2 Comments

df.iloc[:x.name,1] ??
Do we have more faster way to get the results if there is a large data?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.