0

I have a problem with regards as to how to appropriately code this condition. I'm currently creating a new pandas column in my dataframe, new_column, which performs a subtraction on the values in column test, based on what index of the data we are at. I'm currently using this code to get it to subtract a different value every 4 times:

subtraction_value = 3
subtraction_value = 6

data = pd.DataFrame({"test":[12, 4, 5, 4, 1, 3, 2, 5, 10, 9]} 


data['new_column'] = np.where(data.index%4,
                              data['test']-subtraction_value,
                              data['test']-subtraction_value_2)
print (data['new_column']


[6,1,2,1,-5,0,-1,3,4,6]

However, I now wish to get it performing the higher subtraction on the first two positions in the column, and then 3 subtractions with the original value, another two with the higher subtraction value, 3 small subtractions, and so forth. I thought I could do it this way, with an | condition in my np.where statement:

data['new_column'] = np.where((data.index%4) | (data.index%5),
                              data['test']-subtraction_value,
                              data['test']-subtraction_value_2)

However, this didn't work, and I feel my maths may be slightly off. My desired output would look like this:

print(data['new_column'])

[6,-2,2,1,-2,-3,-4,3,7,6])

As you can see, this slightly shifts the pattern. Can I still use numpy.where() here, or do I have to take a new approach? Any help would be greatly appreciated!

4
  • 2
    I don't get why you have a 3 in your expected output (between the -4 and 7) Commented Jun 21, 2022 at 14:59
  • That's the first positional value where the smaller amount would be subtracted (from 5 in the column) Commented Jun 21, 2022 at 15:01
  • But according to your logic I expect 3 to be 2? You substract 6 from the first two values, then 3 for the next three. When you repeat this logic, you substract 3 from 5 and are supposed to get 2. So, your logic does not match your desired output. Commented Jun 21, 2022 at 18:46
  • 1
    Sorry, I've just realised I've been a bit of an idiot, you're right. I apologise! Commented Jun 22, 2022 at 7:15

1 Answer 1

1

As mentioned in the comment section, the output should equal [6,-2,2,1,-2,-3,-4,2,7,6] instead of [6,-2,2,1,-2,-3,-4,3,7,6] according to your logic. Given that, you can do the following:

import pandas as pd
import numpy as np

from itertools import chain

subtraction_value = 3
subtraction_value_2 = 6

data = pd.DataFrame({"test":[12, 4, 5, 4, 1, 3, 2, 5, 10, 9]})

index_pos_large_subtraction = list(chain.from_iterable((data.index[i], data.index[i+1]) for i in range(0, len(data)-1, 5)))
data['new_column'] = np.where(~data.index.isin(index_pos_large_subtraction), data['test']-subtraction_value, data['test']-subtraction_value_2)

# The next line is equivalent to the previous one
# data['new_column'] = np.where(data.index.isin(index_pos_large_subtraction), data['test']-subtraction_value_2, data['test']-subtraction_value)
---------------------------------------------
    test  new_column
0   12    6
1   4     -2
2   5     2
3   4     1
4   1     -2
5   3     -3
6   2     -4
7   5     2
8   10    7
9   9     6
---------------------------------------------

As you can see, np.where works fine. Your masking condition is the problem and needs to be adjusted, you are not selecting rows according to your logic.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for your help, I can see where I was going wrong now. This works fantastic, thanks for your help. I have a better understanding of masking now!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.