0

I have a data frame like this containing 2041 columns.

number   error1   error2    ...   error2040
   1        0       0       ...       1
   2        1       1       ...       1
   3        0       1       ...       0
  ...      ...     ...      ...      ...
result     0.5      0.6               0.001

The result row is probability that the particular error will cause the final error and was calculated maybe not very nicely using piecesbut it works.

Now I want to divide all the numbers into four categories (faulty, probably faulty, not faulty, not enough information), based on the probability in 'result'

So I guess the easiest way is to replace all ones with particular value from 'result' and then add new column called 'prediction' based on the numbers in row so it would look like

number   error1   error2    ...   error2040    PREDICTION 
  1        0       0        ...     0.001      not faulty   
  2       0.5     0.6       ...     0.001      FAULTY   
  3        0      0.6       ...       0        probably faulty  
 ...      ...     ...       ...      ... 
result    0.5     0.6                0.001

But I am stuck and cannot find out how to do the first part - to replace all 1 in all columns with the value from the 'result' row.

Thank you.

4
  • 2
    Instead of replace, multiply ? Commented Mar 18, 2021 at 6:06
  • Now I feel like totally stupid that something like this did not come to my mind. Thank you. Commented Mar 18, 2021 at 6:10
  • 2
    Like @SeaBean said df.iloc[:-1] *= df.iloc[-1] Commented Mar 18, 2021 at 6:34
  • 1
    Good code @piRSquared I have fine-tuned it to tailor-made for the specific layout requirement to skip first column for operation and include test run data below. :-) Commented Mar 18, 2021 at 7:10

1 Answer 1

1

Based on 1) my initial idea of using multiplication instead of replace and 2) riding on @piRSquared's syntax together with 3) modification to exclude first column for operation, you can use:

df.iloc[:-1, 1:] *= df.iloc[-1, 1:]

Test run:

data = {'number': {0: '1', 1: '2', 2: '3', 3: 'result'},
 'error1': {0: 0.0, 1: 1.0, 2: 0.0, 3: 0.5},
 'error2': {0: 0.0, 1: 1.0, 2: 1.0, 3: 0.6},
 'error2040': {0: 1.0, 1: 1.0, 2: 0.0, 3: 0.001}}

df = pd.DataFrame(data)
print(df)

   number  error1  error2  error2040
0       1     0.0     0.0      1.000
1       2     1.0     1.0      1.000
2       3     0.0     1.0      0.000
3  result     0.5     0.6      0.001


df.iloc[:-1, 1:] *= df.iloc[-1, 1:]

print(df)

   number error1 error2 error2040
0       1    0.0    0.0     0.001
1       2    0.5    0.6     0.001
2       3    0.0    0.6       0.0
3  result    0.5    0.6     0.001
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.