I'm trying to choose only unique rows in numpy.ndarray (variable named cluster). When I define this variable explicitely like here:
cluster=np.array([[0.157,-0.4778],[0.157,-0.4778],[0.157,-0.4778],[-0.06156924,-0.21786049],[-0.06156924,-0.21786049],[0.02,-0.35]])
it works as it should:
[[ 0.157 -0.4778 ]
[-0.06156924 -0.21786049]
[ 0.02 -0.35 ]]
But unfortunately this variable cluster is a part of a bigger array (xtrans). So it can be defined only through array slicing:
splitted_clusters=[0,1,4,5,10]
cluster=xtrans[splitted_clusters]
The functions are the same, the data types are the same.
BUT!!! in latter case it works quite weird: it may add identical rows or it may not add them. As a result I have something like this:
[[ 0.157 -0.4778 ]
[ 0.157 -0.4778 ]
[-0.06156924 -0.21786049]
[ 0.02 -0.35 ]]
In my real example with an 44*2 array it adds 22 identical rows and it misses 23 of them (the scheme is quite strange too: it adds rows with indices 0,1,2,4,9,11,12,18 etc). But the number of added identical rows differs. AND it is supposed to add only ONE (the first) row of these 44 rows.
As for method of choosing unique rows firstly I used one from this thread Find unique rows in numpy.array
b =np.ascontiguousarray(cluster).view(np.dtype((np.void, cluster.dtype.itemsize * cluster.shape[1])))
_, idx = np.unique(b, return_index=True)
unique_cl = cluster[idx]
Then I've tried my code to check:
unique_cl=np.array([0,0])
for i in range(cluster.shape[0]):
if i==0:
unique_cl=np.vstack([cluster[i,:]])
elif cluster[i,:].tolist() not in unique_cl.tolist():
unique_cl=np.vstack([unique_cl,cluster[i,:]])
The results are the same and I really have no idea why. I would be very grateful for any help/advice/suggestion/idea.
The problem was in floats. When I rounded values of array to 7 decimal places everything works as should. Thank Eelco Hoogendoorn for this idea.
bthe same? It looks likebis the same data, but each row is viewed as a 16 bytes 'void' element. That allowsuniqueto do its flattened sort and selection.bin this code. It's of type'numpy.ndarray'as well but when I try to print it I see strange symbols and I don't know how encode/decode them:[��|гY�? 9��v���? � h"lx�? @ ��|гY�? 9��v���? � h"lx�? ��|гY�? 9��v���? � h"lx�? �K7�A�? 9��v���? F����x�? ��|гY�? 9��v���? � h"lx�? @ ��|гY�? 9��v���? � h"lx�? ��|гY�? 9��v���? � h"lx�? @ @ @]`bgenerated fromxtrans[splitted_clusters]? We can't debug your problem with out a sample ofxtransor idea of how that gives transformed to produce the newb.xtrans[i,:]==xtrans[j,:]for any two rows that you think are identical. Or lookxtrans[i,:]-xtrans[j,:]. The rows might not be as unique as you think.