4

I'm quite seasoned in R and now learning Python by trying to 'translate' an existing series of scripts from R to Python (df is a pandas DataFrame). I'm stuck at this line :

df[df$id != df$id_old, c("col1", "col2")] <- NA

I.e. I'm trying to fill NA values in specific rows / columns. I've been trying different things, the most promising route seemed to be

index = np.where(df.id != df.id_old)
df.col1[index] = np.repeat(np.nan, np.size(index))

But this throws the following error at the second line (don't fully understand this).

Can only tuple-index with a MultiIndex

What would be the cleanest way to achieve my objective?

Example :

df = pd.DataFrame({'id' : [1, 1, 1, 2, 2, 3, 4, 4, 4, 4, 5, 5], 
    'id_old' : [1, 1, 2, 2, 3, 4, 4, 4, 4, 5, 5, 5], 
    'col1' : np.random.normal(size = 12), 
    'col2' : np.random.randint(low = 20, high = 50, size = 12),
    'col3' : np.repeat('other info', 12)})
print(df)

Output :

   id  id_old      col1  col2        col3
0    1       1  0.320982    31  other info
1    1       1  0.398855    42  other info
2    1       2 -0.664073    30  other info
3    2       2  1.428694    48  other info
4    2       3 -1.240363    49  other info
5    3       4  0.023167    42  other info
6    4       4 -0.645114    44  other info
7    4       4 -1.033602    47  other info
8    4       4  0.295143    27  other info
9    4       5  0.531660    32  other info
10   5       5 -0.787401    33  other info
11   5       5  2.033503    48  other info

Expected result :

   id  id_old      col1  col2        col3
0    1       1  0.320982    31  other info
1    1       1  0.398855    42  other info
2    1       2       NaN   NaN  other info
3    2       2  1.428694    48  other info
4    2       3       NaN   NaN  other info
5    3       4       NaN   NaN  other info
6    4       4 -0.645114    44  other info
7    4       4 -1.033602    47  other info
8    4       4  0.295143    27  other info
9    4       5       NaN   NaN  other info
10   5       5 -0.787401    33  other info
11   5       5  2.033503    48  other info
1
  • You want something like df.loc[index, 'col1'] = .... Commented May 29, 2018 at 9:55

1 Answer 1

3

use .loc and pass a list where in R you would do c(...)

loc allows to do in-place assignment.

example:

df.loc[df.id!=df.id_old, ['col1', 'col2']] = np.nan

outputs:

        col1  col2        col3  id  id_old
0   2.411473  31.0  other info   1       1
1   0.874083  43.0  other info   1       1
2        NaN   NaN  other info   1       2
3   2.156903  20.0  other info   2       2
4        NaN   NaN  other info   2       3
5        NaN   NaN  other info   3       4
6   0.933760  22.0  other info   4       4
7  -1.239806  42.0  other info   4       4
8  -0.493344  41.0  other info   4       4
9        NaN   NaN  other info   4       5
10 -0.751290  30.0  other info   5       5
11  0.327527  31.0  other info   5       5
Sign up to request clarification or add additional context in comments.

1 Comment

Great, that's what I needed - thanks for the quick response!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.