0

I have a dataframe which can be constructed as:

df = pd.DataFrame({'A': [1, 4, 6, 3, 2, 3, 6, 8], 
                   'B': [4, 7, 1, 5, 6, 8, 3, 9], 
                   'C': [1, 5, 3, 1, 6, 8, 9, 0], 
                   'D': [6, 3, 7, 8, 9, 4, 2, 1]})

The df looks like:

    A   B   C   D
0   1   4   1   6
1   4   7   5   3
2   6   1   3   7
3   3   5   1   8
4   2   6   6   9
5   3   8   8   4
6   6   3   9   2
7   8   9   0   1

And there are 2 other variables which are to be used in substitution of values in the df:

mx = pd.core.series.Series([7, 8, 8, 7], index=["A", "B", "C", "D"])
dm = pd.core.series.Series([5, 8, 6, 4], index=["A", "B", "C", "D"])

PROBLEM: I want to replace all the values from the dataframe greater than the corresponding value in dm but less than that in mx with the values from dm. In other words, let's say for "D", I want to replace all the values between 4 and 7 with 4.

So the expected output would look something like:

    A   B   C   D
0   1   4   1   4
1   4   7   5   3
2   5   1   3   4
3   3   5   1   8
4   2   6   6   9
5   3   8   6   4
6   5   3   9   2
7   8   9   0   1

I have tried using df.apply and df.update but I'm unable to make the condition. It always throws a ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Is there an efficient way to achieve this? Any help would be appreciated.

2 Answers 2

2

Use DataFrame.mask with compare DataFrame by Series by DataFrame.le and DataFrame.ge, chained mask by & for bitwise AND and replace by Series with parameter axis=1:

df = df.mask(df.ge(dm) & df.le(mx), dm, axis=1)
print (df)
   A  B  C  D
0  1  4  1  4
1  4  7  5  3
2  5  1  3  4
3  3  5  1  8
4  2  6  6  9
5  3  8  6  4
6  5  3  9  2
7  8  9  0  1
Sign up to request clarification or add additional context in comments.

1 Comment

This is definitely nicer than my way, just notice that according to the expected output you should use ge and le instead of gt and lt
0

I can't tell you if its the best way, and its probably isn't, but this works

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'A': [1, 4, 6, 3, 2, 3, 6, 8],
   ...:                    'B': [4, 7, 1, 5, 6, 8, 3, 9],
   ...:                    'C': [1, 5, 3, 1, 6, 8, 9, 0],
   ...:                    'D': [6, 3, 7, 8, 9, 4, 2, 1]})

In [3]: mx = pd.core.series.Series([7, 8, 8, 7], index=["A", "B", "C", "D"])
   ...: dm = pd.core.series.Series([5, 8, 6, 4], index=["A", "B", "C", "D"])

In [4]: df
Out[4]:
   A  B  C  D
0  1  4  1  6
1  4  7  5  3
2  6  1  3  7
3  3  5  1  8
4  2  6  6  9
5  3  8  8  4
6  6  3  9  2
7  8  9  0  1

In [5]: for col in df.columns:
   ...:     df[col] = df[col].apply(lambda x: x if not dm[col]<=x<=mx[col] else dm[col])
   ...:

In [6]: df
Out[6]:
   A  B  C  D
0  1  4  1  4
1  4  7  5  3
2  5  1  3  4
3  3  5  1  8
4  2  6  6  9
5  3  8  6  4
6  5  3  9  2
7  8  9  0  1

1 Comment

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.