I'm quite seasoned in R and now learning Python by trying to 'translate' an existing series of scripts from R to Python (df is a pandas DataFrame). I'm stuck at this line :
df[df$id != df$id_old, c("col1", "col2")] <- NA
I.e. I'm trying to fill NA values in specific rows / columns. I've been trying different things, the most promising route seemed to be
index = np.where(df.id != df.id_old)
df.col1[index] = np.repeat(np.nan, np.size(index))
But this throws the following error at the second line (don't fully understand this).
Can only tuple-index with a MultiIndex
What would be the cleanest way to achieve my objective?
Example :
df = pd.DataFrame({'id' : [1, 1, 1, 2, 2, 3, 4, 4, 4, 4, 5, 5],
'id_old' : [1, 1, 2, 2, 3, 4, 4, 4, 4, 5, 5, 5],
'col1' : np.random.normal(size = 12),
'col2' : np.random.randint(low = 20, high = 50, size = 12),
'col3' : np.repeat('other info', 12)})
print(df)
Output :
id id_old col1 col2 col3
0 1 1 0.320982 31 other info
1 1 1 0.398855 42 other info
2 1 2 -0.664073 30 other info
3 2 2 1.428694 48 other info
4 2 3 -1.240363 49 other info
5 3 4 0.023167 42 other info
6 4 4 -0.645114 44 other info
7 4 4 -1.033602 47 other info
8 4 4 0.295143 27 other info
9 4 5 0.531660 32 other info
10 5 5 -0.787401 33 other info
11 5 5 2.033503 48 other info
Expected result :
id id_old col1 col2 col3
0 1 1 0.320982 31 other info
1 1 1 0.398855 42 other info
2 1 2 NaN NaN other info
3 2 2 1.428694 48 other info
4 2 3 NaN NaN other info
5 3 4 NaN NaN other info
6 4 4 -0.645114 44 other info
7 4 4 -1.033602 47 other info
8 4 4 0.295143 27 other info
9 4 5 NaN NaN other info
10 5 5 -0.787401 33 other info
11 5 5 2.033503 48 other info
df.loc[index, 'col1'] = ....