3

I have a numpy array:

a = np.array(["dcba", "abc", "bca", "bcda", "tda", "a"])

Now I have a vectorized Levenshtein edit distance function which measures distance of given string with given array, for example, for string ab:

l_distv("ab", a)

returns:

array([3, 1, 3, 4, 3, 1])

I'd like to sort an array in a way so that any element with edit distance smaller than 2 moves to first positions, while the rest are moved behind them without changing their order. So result would be:

array(["abc", "a", "dcba", "bca", "bcda", "tda"])

I've done this, but it's pretty ugly, I assume there is a more efficient way.

6
  • Also could you specify if you want to sort it or just reorder it (put all below 2 at front)? Your explanations seems to suggest that you want sorting but your example result is not exactly sorted. Commented Oct 5, 2015 at 15:35
  • @MSeifert, yeap, "bcda" should be last one. Commented Oct 5, 2015 at 15:44
  • I can't provide the code till I get to job tomorrow. @MSeifert I apologize for not being perfectly clear. Only elements which satisfy the edit distance condition should be moved in front of all other elements. Other elements should not permutate, this is why "bcda" stays in front of "tda", since Lev edit distance of "bcda" and "tda" are larger than 2. Commented Oct 5, 2015 at 17:14
  • @endene: ok, then my answer is not bad at all and should meet your needs. But why did you accept an answer which did permutate 'a' and 'abc'? Commented Oct 5, 2015 at 17:29
  • So, what must be the output if l_distv("ab", a) returned array([3, 1, 3, 4, 3, 0]) instead? Commented Oct 5, 2015 at 20:40

4 Answers 4

3

Add the elements and the edit distances in a dictionary

dictionary = dict(zip(a,array))

then sort the dictionary according to the edit distance

sorted_dictionary = sorted(dictionary.items(), key=operator.itemgetter(1))
Sign up to request clarification or add additional context in comments.

Comments

2

Assuming that those distance values are stored in an array dists, here's one approach -

sort_idx = dists.argsort()
mask = dists < 2
out = np.concatenate((a[sort_idx[mask[sort_idx]]],a[~mask]))

Sample run -

In [144]: a
Out[144]: 
array(['dcba', 'abc', 'bca', 'bcda', 'tda', 'a'], 
      dtype='|S4')

In [145]: dists
Out[145]: array([3, 1, 3, 4, 3, 0]) # Different from listed sample to 
                                    # show how it handles sorting

In [146]: sort_idx = dists.argsort()

In [147]: mask = dists < 2

In [148]: np.concatenate((a[sort_idx[mask[sort_idx]]],a[~mask]))
Out[148]: 
array(['a', 'abc', 'dcba', 'bca', 'bcda', 'tda'], 
      dtype='|S4')

The above approach concatenates two indexed parts of a, which might not be very efficient in terms of runtime. So, with performance in mind, you can create a concatenated indices array instead and then index into a with it in one-go. Thus, the last line from previous implementation has to be changed, like so -

out = a[np.concatenate((sort_idx[mask[sort_idx]],np.where(~mask)[0]))]

Comments

1

If you want to conserve the ordering and only want to put the elements with such a l_dist of smaller than 2 at front I can suggest an answer:

I think you should start by creating an index array

indices = l_distv("ab", a) < 2 # you wanted to move evrything below 2 at the front

this can be quite easily used as mask index, so for example

a[indices] #returns all elements where the l_dist returns smaller than 2
a[~indices] #returns everything >= 2

so you could just rebuild the sorted array by combining these two.

res = np.concatenate((a[indices], a[~indices]))

But it could be that I misunderstand the question and you do not want to keep the initial ordering (your examplaric result seems to suggest it) and really sort it.

I don't know if that's really efficient but it works.

Comments

0

You can use zip and sorted to get your result.

inputs = numpy.array(["dcba", "abc", "bca", "bcda", "tda", "a"])
distances = l_distv("ab", inputs)  # numpy.array([3, 1, 3, 4, 3, 1])
results = zip(inputs, distances)   # [("dcba", 3), ("abc", 1), ...]

# Sort tuples by second value
sorted_results = sorted(results, key=lambda x: x[1])

output = [x[0] for x in sorted_results]  # get just the sorted inputs
output = numpy.array(output)  # use if you need a Numpy array and not a list

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.