How to sort rows of a binary-valued array as if they were long binary numbers?

Question

There is a 2D numpy array of about 500000 rows by 512 values each row:

[
  [1,0,1,...,0,0,1], # 512 1's or 0's
  [0,1,0,...,0,1,1],
  ...
  [0,0,1,...,1,0,1], # row number 500000
]

How to sort the rows ascending as if each row is a long 512-bit integer?

[
  [0,0,1,...,1,0,1],
  [0,1,0,...,0,1,1],
  [1,0,1,...,0,0,1],
  ...
]

I'm not sure but the function "sorted" works to sort strings which are arrays of characters, maybe it works with integer arrays ? — Orionss
– Orionss, Commented Oct 26, 2017 at 11:17
@Orionss Bad idea. Don't bring python functions where numpy arrays are involved. — cs95
– cs95, Commented Oct 26, 2017 at 11:18

Daniel F · Accepted Answer · 2017-10-26 12:17:23Z

4

Instead of converting to strings you can also use a void view (as from @Jaime here) of the data and argsort by that.

def sort_bin(b):
    b_view = np.ascontiguousarray(b).view(np.dtype((np.void, b.dtype.itemsize * b.shape[1])))
    return b[np.argsort(b_view.ravel())] #as per Divakar's suggestion

Testing

np.random.seed(0)

b = np.random.randint(0, 2, (10,5))
print(b)
print(sort_bin(b))

[[0 1 1 0 1]
 [1 1 1 1 1]
 [1 0 0 1 0]
 ..., 
 [1 0 1 1 0]
 [0 1 0 1 1]
 [1 1 1 0 1]]
[[0 0 0 0 1]
 [0 1 0 1 1]
 [0 1 1 0 0]
 ..., 
 [1 1 1 0 1]
 [1 1 1 1 0]
 [1 1 1 1 1]]

Should be much faster and less memory-intensive since b_view is just a view into b

t = np.random.randint(0,2,(2000,512))

%timeit sort_bin(t)
100 loops, best of 3: 3.09 ms per loop

%timeit np.array([[int(i) for i in r] for r in np.sort(np.apply_along_axis(lambda r: ''.join([str(c) for c in r]), 0, t))])
1 loop, best of 3: 3.29 s per loop

About 1000x faster actually

edited Oct 26, 2017 at 12:17

answered Oct 26, 2017 at 11:50

Daniel F

14.5k2 gold badges34 silver badges59 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Divakar Over a year ago

Smart one! Some noticeable improvement with b[np.argsort(b_view.ravel())].

Daniel F Over a year ago

Thanks Divakar, I forgot .flatten() makes a copy.

gonutz · Accepted Answer · 2017-10-26 11:14:12Z

0

You could sort them in a stable way 512 times, starting with the right-most bit first.

Sort by last bit
Sort by second-last bit, stable (to not mess up results of previous sort)
... ...
Sort by first bit, stable

A smaller example: assume you want to sort these three 2-bit numbers by bits:

11
01
00

In the first step, you sort by the right bit, resulting in:

00
11
01

Now you sort by the first bit, in this case we have two 0s in that column. If your sorting algorithm is not stable it would be allowed to put these equal items in any order in the result, that could cause 01 to appear before 00 which we do not want, so we use a stable sort, keeping the relative order of equal items, for the first column, resulting in the desired:

00
01
11

answered Oct 26, 2017 at 11:14

gonutz

5,7123 gold badges28 silver badges43 bronze badges

1 Comment

Daniel F Over a year ago

You could probably use np.lexsort for this, but I'm not sure it would be faster than a string/void conversion.

Joe Iddon · Accepted Answer · 2017-10-26 11:38:27Z

0

Creating a `string` of each `row` and then applying `np.sort()`

So if we have an array to test on:

a = np.array([[1,0,0,0],[0,0,0,0],[1,1,1,1],[0,0,1,1]])

We can create strings of each row by using np.apply_along_axis:

a = np.apply_along_axis(lambda r: ''.join([str(c) for c in r]), 0, a)

which would make a now:

array(['1010', '0010', '0011', '0011'], dtype='<U4')

and so now we can sort the strings with np.sort():

a = np.sort(a)

making a:

array(['0010', '0011', '0011', '1010'], dtype='<U4')

we can then convert back to the original format with:

a = np.array([[int(i) for i in r] for r in a])

which makes a:

array([[0, 0, 1, 0],
       [0, 0, 1, 1],
       [0, 0, 1, 1],
       [1, 0, 1, 0]])

And if you wanted to cram this all into one line:

a = np.array([[int(i) for i in r] for r in np.sort(np.apply_along_axis(lambda r: ''.join([str(c) for c in r]), 0, a))])

edited Oct 26, 2017 at 11:38

answered Oct 26, 2017 at 11:24

Joe Iddon

20.5k7 gold badges38 silver badges62 bronze badges

1 Comment

Serge Over a year ago

Thanks for the suggestion. It just seems "wrong" to convert single bits to character bytes. Maybe numpy.packbits() each row to 64 uint8's and then use your method?

Iosif Serafeimidis · Accepted Answer · 2017-10-26 12:37:56Z

This is slow but does the job.

def sort_col(arr, col_num=0):
# if we have sorted over all columns return array
if col_num >= arr.shape[1]:
    return arr

# sort array over given column
arr_sorted = arr[arr[:, col_num].argsort()]

# if the number of 1s in the given column is not equal to the total number
# of rows neither equal to 0, split on 1 and 0, sort and then merge
if len(arr) > np.sum(arr_sorted[:, col_num]) > 0:
    arr_sorted0s = sort_col(arr_sorted[arr_sorted[:, col_num]==0], col_num+1)
    arr_sorted1s = sort_col(arr_sorted[arr_sorted[:, col_num]==1], col_num+1)
    # change order of stacking if you want ascenting order
    return np.vstack((arr_sorted0s, arr_sorted1s))

# if the number of 1s in the given column is equal to the total number
# of rows or equal to 0, just go to the next iteration
return sort_col(arr_sorted, col_num + 1)



np.random.seed(0)
a = np.random.randint(0, 2, (5, 4))
print(a)
print(sort_col(a))

# prints
[[0 1 1 0]
 [1 1 1 1]
 [1 1 1 0]
 [0 1 0 0]
 [0 0 0 1]]
[[0 0 0 1]
 [0 1 0 0]
 [0 1 1 0]
 [1 1 1 0]
 [1 1 1 1]]

Edit. Or better yet use Daniels solution. I didn't check for new answers before I posted my code.

Collectives™ on Stack Overflow

How to sort rows of a binary-valued array as if they were long binary numbers?

4 Answers 4

2 Comments

1 Comment

Creating a `string` of each `row` and then applying `np.sort()`

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

1 Comment

Creating a string of each row and then applying np.sort()

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related

Creating a `string` of each `row` and then applying `np.sort()`