0

I am trying to build a mental model around how the code below works and it is confusing me. I am trying to sort a string so every duplicate letter is together, but the capital letter comes first. It is solved, the method below will do it, but I want to know why do you have to sort it first? Does it keep the same position from the first sort? So when you call sort_by it then sorts by lowercase but the capital letters stay where they originally were? Can anyone break down step by step what is happening so I can understand this better?

def alpha(str)
 str.chars.sort.sort_by { |ele| ele.downcase }.join
end

alpha("AaaaaZazzz") == "AaaaaaZzzz"
3
  • I think there's a missing .join somewhere (either in alpha, or in the call to alpha), because "alpha" returns an array of characters, however the comparison is between two strings. This is just a minor bug in your question, but isn't the answer to your question. Commented May 4, 2021 at 1:31
  • Also, could you please add what output you are getting when you remove .sort? Commented May 4, 2021 at 1:34
  • Yes! I fixed it. Sorry about that! Commented May 5, 2021 at 4:39

3 Answers 3

3

Let's rewrite your method as follows.

def alpha(str)
 sorted_chars_by_case = str.chars.sort
 puts "sorted_chars = #{sorted_chars}"
 sorted_chars_by_downcased = sorted_chars_by_case.sort_by(&:downcase)
 puts "sorted_chars_by_downcased = #{sorted_chars_by_downcased}"
 sorted_chars_by_downcased.join
end

Then:

alpha("AaaaaZazzz")
  sorted_chars_by_case = ["A", "Z", "a", "a", "a", "a", "a", "z", "z", "z"]
  sorted_chars_indifferent = ["A", "a", "a", "a", "a", "a", "Z", "z", "z", "z"]
  #=> "AaaaaaZzzz"

As you see, the first step, after converting the string to an array of characters, is to form an array whose first elements are upper-case letters, in order, followed by lower-case letters, also ordered.1 The second step is sort sorted_chars_by_case without reference to case. That array is then joined to return the desired string, "AaaaaaZzzz".

While this gives the desired result, it is only happenstance that it does. A different sorting method could well have returned, say, "aaaaAazZzz", because "A" is treated the same as "a" in the second sort.

What you want is a two-level sort; sort by characters, case-indifferent, then when there are ties ("A" and "a", for example), sort the upper-case letter first. You can do that by sorting two-element arrays.

def alpha(str)
  str.each_char.sort_by { |ele| [ele.downcase, ele] }.join
end

Then

alpha("AaaaaZazzz")
  #=> "AaaaaaZzzz"

When sorting arrays the method Array#<=> is used to order two arrays. Note in particular the third paragraph of that doc.

If "A" and "z" are being ordered, for example, Ruby compares the arrays

a1 = ["a", "A"]
a2 = ["z", "z"]

As a1.first < a2.first #=> true, we see that a1 <=> a2 #=> -1, so "A" precedes "z" in the sort. Here a1.last and a2.last are not examined.

Now suppose "z" and "Z" are being ordered. Ruby compares the arrays

a1 = ["z", "z"]
a2 = ["z", "Z"]

As a1.first equals a2.first, a1.last and a2.last are compared to break the tie. Since "z" > "Z" #=> true, a1 <=> a2 #=> 1, so "Z" precedes "z" in the sort.


Note that I replace str.chars with str.each_char. It's generally a small thing, but String#chars returns an array of characters, whereas String#each_char returns an enumerator, and therefore is more space-efficient.

Sometimes you need to return an array, and therefore you must use chars. An example is str.chars.cycle, where you are chaining to the Array method cycle. On the other hand, if you are chaining to an enumerator (an instance of the class Enumerator), you must use each_char, an example being str.each_char.with_object([]) ....

Often, however, you have a choice: str.chars.sort, using Array#sort, or str.each_char.sort, using Enumerable#sort. In those situations each_char is preferred because of the reduced memory requirement. The rule, therefore, is to use chars when you are chaining to an Array method, otherwise use each_char.

1. sort_by(&:downcase) can be thought of as shorthand for sort_by { |ele| ele.downcase }.

Sign up to request clarification or add additional context in comments.

Comments

1

You can't depend on the stability of sort in Ruby

This is an interesting question. Whether or not a sort preserves the order of equal elements is its "stability." A sort is stable if it is guaranteed to preserve the order of equal elements, and unstable if it has no such guarantee. An unstable sort may by chance return equal elements in their original order, or not.

In MRI 2.7.1, sort happens to be stable, but it is actually implementation defined whether or not it is. See https://stackoverflow.com/a/44486562/238886 for all the juicy details, including code you can run in your Ruby to find out if your sort happens to be stable. But whether or not your sort is stable, you should not depend on it.

A stable sort does indeed return the result you are expecting, and it does so whether or not you include the .sort:

2.7.1 :035 > "AaaaaZazzz".chars.sort_by { |ele| ele.downcase }.join
 => "AaaaaaZzzz" 
2.7.1 :036 > "AaaaaZazzz".chars.sort.sort_by { |ele| ele.downcase }.join
 => "AaaaaaZzzz" 

But you can make sort act stable when you need

In order to not depend up on the stability of the sort, which could change when you move your code to another Ruby version or implementation, you can enforce stability like this:

 "AaaaaZazzz".chars.sort_by.with_index { |ele, i| [ele.downcase, i] }.join
 => "AaaaaaZzzz" 

How does unstable sort behave

We can force Ruby 2.7.1's sort to be unstable by adding a random number as a secondary sort order:

2.7.1 :040 > "AaaaaZazzz".chars.sort.sort_by { |ele, i| [ele.downcase, rand] }.join
 => "AaaaaaZzzz" 
2.7.1 :041 > "AaaaaZazzz".chars.sort.sort_by { |ele, i| [ele.downcase, rand] }.join
 => "aaaaAazzZz" 

Note how we got the same answer as stable sort the first time, but then a different answer? That's a demonstration of how an unstable sort can, by chance, give you the same results as a stable sort. But you can't count on it.

Comments

1

First you sort in this code all characters alphabetically based on the collating sequence of the underlying encoding, and then you sort the characters in a way that upper and lower case characters are treated equivalent. This cancels the effect of the first sort. Hence the output is equivalent to str.chars.sort_by(&:downcase), which would IMO a more sensible way to write the expression.

The first sort has no effect and is therefore just a cycle stealer. BTW: Since the stability of Ruby sort is unspecified, and in particular MRI Ruby is known to be unstable, you have no control about the relative order of individual characters which are considered equivalent in sort order. Note also that the result depends on the locale, because this decides whether - for instance - the letters Б and б are considered the same in sort order or different.

5 Comments

MRI's sort stability is undefined and implementation specific. It is sometimes stable and sometimes not.
This is what I consider "unstable" ;-) Of course by chance the respective order could sometimes happen to be as in the unsorted data. Unstable does not mean that the algorithm changes the order on purpose...
In my testing, I found that any particular implementation of a particular version of Ruby always has the same stability. It's either always stable, or always unstable. But whether or not it is stable depends upon the specific implementation/version. This is a bit of a trap for a programmer who accidentally depends on their implementation having a stable sort, then moves their code to an implementation with an unstable sort.
I wonder how you can verify the stability. Did you actually analyze the source code of the sort routine? If you just were trying it out, the best you can do is to prove unstability by finding by chance one example which exhibits unstability, but if you always get stable results, it can be just by chance. BTW, I don't think this is a trap, because the docs explicitly say that it is unstable. From the docs: The result is not guaranteed to be stable. When two keys are equal, the order of the corresponding elements is unpredictable.
I examined the source, and I wrote code that can test the stability of the sort. The code to test stability is here: stackoverflow.com/a/44486562/238886 . The technique is to use Ruby's sort normally, and then in a way that forces stability, and see if the two sorted arrays are equal. In the version of Ruby I examined the source for (MRI 2.4.1 if I recall), either a stable or an unstable sort could be compiled, depending on #ifdefs. MRI 2.4.1 is stable on Linux, but unstable on Windows. Same source, different stability depending on the platform.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.