3

I'm considering some Python data that are lists of arrays in the form:

LA=
[array([  99.08322813,  253.42371683,  300.792029  ])
array([  51.55274095,  106.29707418,  0])
array([0, 0 ,0 , 0, 0])
array([ 149.07283952,  191.45513754,  251.19610503,  393.50806493, 453.56783459])
array([ 105.61643877,  442.76668729,  450.37335607])
array([ 348.84179544])
array([], dtype=float64)]
array([0, 0 , 0])
array([ 295.05603151,  0,  451.77083268,  500.81771919])
array([ 295.05603151,  307.37232315,  451.77083268,  500.81771919])
array([  91.86758237,  148.70156948,  488.70648486,  507.31389766])
array([ 353.68691095])
array([ 208.21919198,  246.57665959,  0,  251.33820305, 394.34266882])
array([], dtype=float64)]

In my data I get some empty arrays:

array([], dtype=float64)] 

and arrays filled with zeros:

array([0, 0, 0])

How can I get rid of both kind of arrays in an automated simple way to and up with

LA=
[array([  99.08322813,  253.42371683,  300.792029  ])
array([  51.55274095,  106.29707418,  0])
array([ 149.07283952,  191.45513754,  251.19610503,  393.50806493, 453.56783459])
array([ 105.61643877,  442.76668729,  450.37335607])
array([ 348.84179544])
array([ 295.05603151,  0,  451.77083268,  500.81771919])
array([ 295.05603151,  307.37232315,  451.77083268,  500.81771919])
array([  91.86758237,  148.70156948,  488.70648486,  507.31389766])
array([ 353.68691095])
array([ 208.21919198,  246.57665959,  0,  251.33820305, 394.34266882])

Finally I would like to remove the zeros as well keeping the array list format to get

LA=
[array([  99.08322813,  253.42371683,  300.792029  ])
array([  51.55274095,  106.29707418])
array([ 149.07283952,  191.45513754,  251.19610503,  393.50806493, 453.56783459])
array([ 105.61643877,  442.76668729,  450.37335607])
array([ 348.84179544])
array([ 295.05603151,  451.77083268,  500.81771919])
array([ 295.05603151,  307.37232315,  451.77083268,  500.81771919])
array([  91.86758237,  148.70156948,  488.70648486,  507.31389766])
array([ 353.68691095])
array([ 208.21919198,  246.57665959,  251.33820305, 394.34266882])

Thanks in advance

1
  • 1
    Have you tried anything? Show us your code. Commented Dec 5, 2013 at 13:15

2 Answers 2

5

a list comprehension should do the first part

[x for x in LA if x.any()]

You can do the second part with compress

[x.compress(x) for x in LA if x.any()]

Faster version based on Ashwini's idea

[x.compress(x) for x in LA if count_nonzero(x)]

Timing:

In [89]: %timeit [x.compress(x) for x in LA if count_nonzero(x)]  #clear winner                                
10000 loops, best of 3: 20.2 µs per loop     
Sign up to request clarification or add additional context in comments.

Comments

5

Using NumPy and a list comprehension:

>>> from numpy import *

Solution 1:

>>> [x[x!=0] for x in LA if len(x) and len(x[x!=0])]                          
[array([  99.08322813,  253.42371683,  300.792029  ]),                                           
 array([  51.55274095,  106.29707418]),                                                          
 array([ 149.07283952,  191.45513754,  251.19610503,  393.50806493,                              
        453.56783459]),                                                                          
 array([ 105.61643877,  442.76668729,  450.37335607]),                                           
 array([ 348.84179544]),                                                                         
 array([ 295.05603151,  451.77083268,  500.81771919]),                                           
 array([ 295.05603151,  307.37232315,  451.77083268,  500.81771919]),                            
 array([  91.86758237,  148.70156948,  488.70648486,  507.31389766]),                            
 array([ 353.68691095]),                                                                         
 array([ 208.21919198,  246.57665959,  251.33820305,  394.34266882])]    

Solution 2:

>>> [x[x!=0] for x in LA if count_nonzero(x)]                          
[array([  99.08322813,  253.42371683,  300.792029  ]),                                           
 array([  51.55274095,  106.29707418]),                                                          
 array([ 149.07283952,  191.45513754,  251.19610503,  393.50806493,                              
        453.56783459]),                                                                          
 array([ 105.61643877,  442.76668729,  450.37335607]),                                           
 array([ 348.84179544]),                                                                         
 array([ 295.05603151,  451.77083268,  500.81771919]),                                           
 array([ 295.05603151,  307.37232315,  451.77083268,  500.81771919]),                            
 array([  91.86758237,  148.70156948,  488.70648486,  507.31389766]),                            
 array([ 353.68691095]),                                                                         
 array([ 208.21919198,  246.57665959,  251.33820305,  394.34266882])]    

Timing comparison:

In [56]: %timeit  [x[x!=0] for x in LA if len(x) and len(x[x!=0])]                     
10000 loops, best of 3: 176 µs per loop                                                          

In [88]: %timeit [x[x!=0] for x in LA if count_nonzero(x)]                                   
10000 loops, best of 3: 89.7 µs per loop   

#@gnibbler's solution:

In [82]: %timeit [x.compress(x) for x in LA if x.any()]                                          
10000 loops, best of 3: 138 µs per loop  

Timing results for larger arrays:

In [140]: LA = [resize(x, 10**5) for x in LA]                                                    

In [142]: %timeit [x[x!=0] for x in LA if len(x) and len(x[x!=0])]                               
10 loops, best of 3: 26.7 ms per loop                                                            

In [143]: %timeit [x[x!=0] for x in LA if count_nonzero(x) > 0]                                  
10 loops, best of 3: 26 ms per loop                                                              

In [144]: %timeit [x.compress(x) for x in LA if x.any()]                                         
10 loops, best of 3: 42.7 ms per loop                                                            

In [145]: %timeit [x.compress(x) for x in LA if count_nonzero(x)]                                
10 loops, best of 3: 45.8 ms per loop                                                            

In [146]: %timeit [x[x!=0] for x in LA if x.any()]                                               
10 loops, best of 3: 22.9 ms per loop                                                            

In [147]: %timeit [x[x!=0] for x in LA if count_nonzero(x)]                                      
10 loops, best of 3: 26.2 ms per loop  

6 Comments

Can you time my answer too?
@gnibbler compress one took around 138us.
When you're using count_nonzero, you don't need the the len(x) check.
aha...any is the slow part. count_nonzero is much faster
@gnibbler I actually expected it to be faster, i.e it should short-circuit like Python's any does. BTW I resized all items to 100000 and timed again and this time [x[x!=0] for x in LA if x.any()] came out to be fastest, and shockingly [x.compress(x) for x in LA if count_nonzero(x)] was slowest.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.