0

I have a df like this:

     t1      t2     t3
0    a       b      c
1            b      
2 
3    
4    a       b      c
5            b      
6
7

I want to drop all values after index 5 because it has no values, but not index 2,3. I will not know whether each column will have data or not.

All values are strings.

1 Answer 1

1
In [74]: df.iloc[:np.where(df.any(axis=1))[0][-1]+1]
Out[74]: 
   t1 t2 t3
10  a  b  c
11  b      
12         
13         
14  a  b  c
15  b      

Explanation: First find which rows contain something other than empty strings:

In [37]: df.any(axis=1)
Out[37]: 
0     True
1     True
2    False
3    False
4     True
5     True
6    False
7    False
dtype: bool

Find the location of the rows which are True:

In [71]: np.where(df.any(axis=1))
Out[71]: (array([0, 1, 4, 5]),)

Find the largest index (which will also be the last):

In [72]: np.where(df.any(axis=1))[0][-1]
Out[72]: 5

Then you can use df.iloc to select all rows up to and including the index with value 5.

Note that the first method I suggested is not as robust; if your dataframe has an index with repeated values, then selecting the rows with df.loc is problematic.

The new method is also a bit faster:

In [75]: %timeit df.iloc[:np.where(df.any(axis=1))[0][-1]+1]
1000 loops, best of 3: 203 µs per loop

In [76]: %timeit df.loc[:df.any(axis=1).cumsum().argmax()]
1000 loops, best of 3: 296 µs per loop
Sign up to request clarification or add additional context in comments.

9 Comments

I could use that, but then I would need to get rid of all of the NaNs
the data in there is actually datetime.datetime objects.
and the empty ones are...?
it is a <type 'str'>. Sorry it was inherited.
actually, all the data is strings. I mispoke
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.