64

I have a large dataframe which looks as:

df1['A'].ix[1:3]
2017-01-01 02:00:00    [33, 34, 39]
2017-01-01 03:00:00    [3, 43, 9]

I want to replace each element greater than 9 with 11.

So, the desired output for above example is:

df1['A'].ix[1:3]
2017-01-01 02:00:00    [11, 11, 11]
2017-01-01 03:00:00    [3, 11, 9]

Edit:

My actual dataframe has about 20,000 rows and each row has list of size 2000.

Is there a way to use numpy.minimum function for each row? I assume that it will be faster than list comprehension method?

1
  • So values are not in list? Ithink df[df > 9] = 11 solution is wrong. Or something missing? Commented Jan 8, 2019 at 14:58

5 Answers 5

67

Very simply : df[df > 9] = 11

Sign up to request clarification or add additional context in comments.

1 Comment

in 2024, in python 3, dont work
47

You can use apply with list comprehension:

df1['A'] = df1['A'].apply(lambda x: [y if y <= 9 else 11 for y in x])
print (df1)
                                A
2017-01-01 02:00:00  [11, 11, 11]
2017-01-01 03:00:00    [3, 11, 9]

Faster solution is first convert to numpy array and then use numpy.where:

a = np.array(df1['A'].values.tolist())
print (a)
[[33 34 39]
 [ 3 43  9]]

df1['A'] = np.where(a > 9, 11, a).tolist()
print (df1)
                                A
2017-01-01 02:00:00  [11, 11, 11]
2017-01-01 03:00:00    [3, 11, 9]

2 Comments

This method replaces nan values with the number following else which is not something I want to do.
First one gives me: TypeError: 'int' object is not iterable
42

I know this is an old post, but pandas now supports DataFrame.where directly. In your example:

df.where(df <= 9, 11, inplace=True)

Please note that pandas' where is different than numpy.where. In pandas, when the condition == True, the current value in the dataframe is used. When condition == False, the other value is taken.

EDIT:

You can achieve the same for just a column with Series.where:

df['A'].where(df['A'] <= 9, 11, inplace=True)

Comments

27

You can use numpy indexing, accessed through the .values function.

df['col'].values[df['col'].values > x] = y

where you are replacing any value greater than x with the value of y.

So for the example in the question:

df1['A'].values[df1['A'] > 9] = 11

1 Comment

This was the best solution I could find that worked as expected.
6

I came for a solution to replacing each element larger than h by 1 else 0, which has the simple solution:

df = (df > h) * 1

(This does not solve the OP's question as all df <= h are replaced by 0.)

2 Comments

Why do you write it as an answer, if it doesn't answer the OP's question?
Because the title (which led me and potentially others to come here) is imprecise and could imply this answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.