0

I access rows in pandas with the loc function as below:

pdf.loc[pdf.a>2]

Is this vectorised? Is it better than using numpy

pdf[pdf.a>2]
3
  • Following up from stackoverflow.com/questions/34426247/… Commented Apr 26, 2019 at 19:54
  • i think loc[] is better then a for loop when you do a conditional update based on columns. Commented Apr 26, 2019 at 20:03
  • numpy will be faster, but then you lose the indices, which are super useful and inherent to pandas. pdf.to_numpy()[np.where(pdf.a > 2)[0]] should be faster than .loc Commented Apr 27, 2019 at 2:14

1 Answer 1

1

This timing suggests there is no slow down with loc

testa = pd.DataFrame(np.arange(10000000),columns =['q'])
%timeit testb = testa.loc[testa.q>6] 
%timeit testc = testa[testa.q>7]

1 loop, best of 3: 207 ms per loop
1 loop, best of 3: 208 ms per loop
Sign up to request clarification or add additional context in comments.

1 Comment

Reading a bit more into this, vectorisation just means that the for loop is done at the c level. Presumably this can be done in loc. One other thing I'm confused about is why the loc command uses [ ] rather than ( ). Presumably this implies something about what loc is doing that I have never understood.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.