2

I have an array of integers:

import numpy as np

demo = np.array([[1, 2, 3],
                 [1, 5, 3],
                 [4, 5, 6],
                 [7, 8, 9],
                 [4, 2, 3],
                 [4, 2, 12],
                 [10, 11, 13]])

And I want an array of unique values in the columns, padded with something if necessary (e.g. nan):

[[1, 4, 7, 10, nan],
 [2, 5, 8, 11, nan],
 [3, 6, 9, 12,  13]]

It does work when I iterate over the transposed array and use a boolean_indexing solution from a previous question. But I was hoping there would be a built-in method:

solution = []
for row in np.unique(demo.T, axis=1):
    solution.append(np.unique(row))

def boolean_indexing(v, fillval=np.nan):
    lens = np.array([len(item) for item in v])
    mask = lens[:,None] > np.arange(lens.max())
    out = np.full(mask.shape,fillval)
    out[mask] = np.concatenate(v)
    return out

print(boolean_indexing(solution))

2 Answers 2

1

AFAIK, there are no builtin solution for that. That being said, your solution seems a bit complex to me. You could create an array with initialized values and fill it with a simple loop (since you already use loops anyway).

solution = [np.unique(row) for row in np.unique(demo.T, axis=1)]

result = np.full((len(solution), max(map(len, solution))), np.nan)
for i,arr in enumerate(solution):
    result[i][:len(arr)] = arr
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you, this is definitely cleaner than what I had. I hoped to avoid the loop, but numpy does tend to avoid cases where the output sizes have less-than-obvious layouts.
1

If you want to avoid the loop you could do:

demo = demo.astype(np.float32) # nan only works on floats

sort = np.sort(demo, axis=0)
diff = np.diff(sort, axis=0)
np.place(sort[1:], diff == 0, np.nan)
sort.sort(axis=0)
edge = np.argmax(sort, axis=0).max()
result = sort[:edge]

print(result.T)

Output:

array([[ 1.,  4.,  7., 10., nan],
       [ 2.,  5.,  8., 11., nan],
       [ 3.,  6.,  9., 12., 13.]], dtype=float32)

Not sure if this is any faster than the solution given by Jérôme.

EDIT

A slightly better solution

demo = demo.astype(np.float32)

sort = np.sort(demo, axis=0)
mask = np.full(sort.shape, False, dtype=bool)
np.equal(sort[1:], sort[:-1], out=mask[1:])
np.place(sort, mask, np.nan)
edge = (~mask).sum(0).max()
result = np.sort(sort, axis=0)[:edge]

print(result.T)

Output:

array([[ 1.,  4.,  7., 10., nan],
       [ 2.,  5.,  8., 11., nan],
       [ 3.,  6.,  9., 12., 13.]], dtype=float32)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.