How to find the lexicographically smallest string by reversing a substring?

Question

I have a string S which consists of a's and b's. Perform the below operation once. Objective is to obtain the lexicographically smallest string.

Operation: Reverse exactly one substring of S

e.g.

if S = abab then Output = aabb (reverse ba of string S)
if S = abba then Output = aabb (reverse bba of string S)

My approach

Case 1: If all characters of the input string are same then output will be the string itself.

Case 2: if S is of the form aaaaaaa....bbbbbb.... then answer will be S itself.

otherwise: Find the first occurence of b in S say the position is i. String S will look like

aa...bbb...aaaa...bbbb....aaaa....bbbb....aaaaa...
     |
     i

In order to obtain the lexicographically smallest string the substring that will be reversed starts from index i. See below for possible ending j.

aa...bbb...aaaa...bbbb....aaaa....bbbb....aaaaa...
     |           |               |               |
     i           j               j               j

Reverse substring S[i:j] for every j and find the smallest string. The complexity of the algorithm will be O(|S|*|S|) where |S| is the length of the string.

Is there a better way to solve this problem? Probably O(|S|) solution.

What I am thinking if we can pick the correct j in linear time then we are done. We will pick that j where number of a's is maximum. If there is one maximum then we solved the problem but what if it's not the case? I have tried a lot. Please help.

Isn't the answer is always to write all 'a' first then all 'b'? — Abdennacer Lachiheb
– Abdennacer Lachiheb, Commented Sep 15, 2017 at 16:34
@AbdenaceurLichiheb you can perform the operation exactly once. It's not necessary that you can bring all a's in the beginning and all b's at the end always. e.g. S = ababba output=aabbab — cryptomanic
– cryptomanic, Commented Sep 15, 2017 at 16:34
yes I know, imagine string abababa answer is aaaabbb, you only need to count the number of 'a' and 'b' and then write all 'a' then all 'b'. — Abdennacer Lachiheb
– Abdennacer Lachiheb, Commented Sep 15, 2017 at 16:36
I don't think there is a linear algorithm, there should be a linear-logarithmic algorithm though. — Jared Goguen
– Jared Goguen, Commented Sep 15, 2017 at 16:58
@AbdenaceurLichiheb: Which one substring did you reverse to go from abababa to aaaabbb? — Mark Peters
– Mark Peters, Commented Sep 15, 2017 at 18:28

Jared Goguen · Accepted Answer · 2017-09-15 18:31:21Z

So, I came up with an algorithm, that seems to be more efficient that O(|S|^2), but I'm not quite sure of it's complexity. Here's a rough outline:

Strip of the leading a's, storing in variable start.
Group the rest of the string into letter chunks.
Find the indices of the groups with the longest sequences of a's.
If only one index remains, proceed to 10.
Filter these indices so that the length of the [first] group of b's after reversal is at a minimum.
If only one index remains, proceed to 10.
Filter these indices so that the length of the [first] group of a's (not including the leading a's) after reversal is at a minimum.
If only one index remains, proceed to 10.
Go back to 5, except inspect the [second/third/...] groups of a's and b's this time.
Return start, plus the reversed groups up to index, plus the remaining groups.

Since any substring that is being reversed begins with a b and ends in an a, no two hypothesized reversals are palindromes and thus two reversals will not result in the same output, guaranteeing that there is a unique optimal solution and that the algorithm will terminate.

My intuition says this approach of probably O(log(|S|)*|S|), but I'm not too sure. An example implementation (not a very good one albeit) in Python is provided below.

from itertools import groupby

def get_next_bs(i, groups, off):
    d = 1 + 2*off
    before_bs = len(groups[i-d]) if i >= d else 0
    after_bs = len(groups[i+d]) if i <= d and len(groups) > i + d else 0
    return before_bs + after_bs

def get_next_as(i, groups, off):
    d = 2*(off + 1)
    return len(groups[d+1]) if i < d else len(groups[i-d])

def maximal_reversal(s):
    # example input: 'aabaababbaababbaabbbaa'

    first_b = s.find('b')
    start, rest = s[:first_b], s[first_b:] 
    # 'aa', 'baababbaababbaabbbaa'

    groups = [''.join(g) for _, g in groupby(rest)]
    # ['b', 'aa', 'b', 'a', 'bb', 'aa', 'b', 'a', 'bb', 'aa', 'bbb', 'aa']

    try:
        max_length = max(len(g) for g in groups if g[0] == 'a')
    except ValueError:
        return s # no a's after the start, no reversal needed

    indices = [i for i, g in enumerate(groups) if g[0] == 'a' and len(g) == max_length]
    # [1, 5, 9, 11]

    off = 0
    while len(indices) > 1:
        min_bs = min(get_next_bs(i, groups, off) for i in indices)
        indices = [i for i in indices if get_next_bs(i, groups, off) == min_bs]
        # off 0: [1, 5, 9], off 1: [5, 9], off 2: [9]

        if len(indices) == 1:
            break

        max_as = max(get_next_as(i, groups, off) for i in indices)
        indices = [i for i in indices if get_next_as(i, groups, off) == max_as]
        # off 0: [1, 5, 9], off 1: [5, 9]

        off += 1

    i = indices[0]
    groups[:i+1] = groups[:i+1][::-1]

    return start + ''.join(groups)
    # 'aaaabbabaabbabaabbbbaa'

What if there are multiple occurences of such substring? Which one to choose?
@cryptomanic Missed that point, I've written a new algorithm that addresses this issue.

m69 ''snarky and unwelcoming'' · Accepted Answer · 2017-09-24 19:58:09Z

1

TL;DR: Here's an algorithm that only iterates over the string once (with O(|S|)-ish complexity for limited string lengths). The example with which I explain it below is a bit long-winded, but the algorithm is really quite simple:

Iterate over the string, and update its value interpreted as a reverse (lsb-to-msb) binary number.

If you find the last zero of a sequence of zeros that is longer than the current maximum, store the current position, and the current reverse value. From then on, also update this value, interpreting the rest of the string as a forward (msb-to-lsb) binary number.

If you find the last zero of a sequence of zeros that is as long as the current maximum, compare the current reverse value with the current value of the stored end-point; if it is smaller, replace the end-point with the current position.

So you're basically comparing the value of the string if it were reversed up to the current point, with the value of the string if it were only reversed up to a (so-far) optimal point, and updating this optimal point on-the-fly.

Here's a quick code example; it could undoubtedly be coded more elegantly:

function reverseSubsequence(str) {
    var reverse = 0, max = 0, first, last, value, len = 0, unit = 1;
    
    for (var pos = 0; pos < str.length; pos++) {
        var digit = str.charCodeAt(pos) - 97;                   // read next digit
        if (digit == 0) {
            if (first == undefined) continue;                   // skip leading zeros
            if (++len > max || len == max && reverse < value) { // better endpoint found
                max = len;
                last = pos;
                value = reverse;
            }
        } else {
            if (first == undefined) first = pos;                // end of leading zeros
            len = 0;
        }
        reverse += unit * digit;                                // update reverse value
        unit <<= 1;
        value = value * 2 + digit;                              // update endpoint value
    }
    return {from: first || 0, to: last || 0};
}
var result = reverseSubsequence("aaabbaabaaabbabaaabaaab");
document.write(result.from + "&rarr;" + result.to);

(The code could be simplified by comparing reverse and value whenever a zero is found, and not just when the end of a maximally long sequence of zeros is encountered.)

You can create an algorithm that only iterates over the input once, and can process an incoming stream of unknown length, by keeping track of two values: the value of the whole string interpreted as a reverse (lsb-to-msb) binary number, and the value of the string with one part reversed. Whenever the reverse value goes below the value of the stored best end-point, a better end-point has been found.

Consider this string as an example:

aaabbaabaaabbabaaabaaab

or, written with zeros and ones for simplicity:

00011001000110100010001

We iterate over the leading zeros until we find the first one:

0001
   ^

This is the start of the sequence we'll want to reverse. We will start interpreting the stream of zeros and ones as a reversed (lsb-to-msb) binary number and update this number after every step:

reverse = 1, unit = 1

Then at every step, we double the unit and update the reverse number:

0001        reverse = 1
00011       unit = 2; reverse = 1 + 1 * 2 = 3
000110      unit = 4; reverse = 3 + 0 * 4 = 3
0001100     unit = 8; reverse = 3 + 0 * 8 = 3

At this point we find a one, and the sequence of zeros comes to an end. It contains 2 zeros, which is currently the maximum, so we store the current position as a possible end-point, and also store the current reverse value:

endpoint = {position = 6, value = 3}

Then we go on iterating over the string, but at every step, we update the value of the possible endpoint, but now as a normal (msb-to-lsb) binary number:

00011001      unit = 16; reverse = 3 + 1 * 16 = 19
              endpoint.value *= 2 + 1 = 7
000110010     unit = 32; reverse = 19 + 0 * 32 = 19
              endpoint.value *= 2 + 0 = 14
0001100100    unit = 64; reverse = 19 + 0 * 64 = 19
              endpoint.value *= 2 + 0 = 28
00011001000   unit = 128; reverse = 19 + 0 * 128 = 19
              endpoint.value *= 2 + 0 = 56

At this point we find that we have a sequence of 3 zeros, which is longer that the current maximum of 2, so we throw away the end-point we had so far and replace it with the current position and reverse value:

endpoint = {position = 10, value = 19}

And then we go on iterating over the string:

000110010001         unit = 256; reverse = 19 + 1 * 256 = 275
                     endpoint.value *= 2 + 1 = 39
0001100100011        unit = 512; reverse = 275 + 1 * 512 = 778
                     endpoint.value *= 2 + 1 = 79
00011001000110       unit = 1024; reverse = 778 + 0 * 1024 = 778
                     endpoint.value *= 2 + 0 = 158
000110010001101      unit = 2048; reverse = 778 + 1 * 2048 = 2826
                     endpoint.value *= 2 + 1 = 317
0001100100011010     unit = 4096; reverse = 2826 + 0 * 4096 = 2826
                     endpoint.value *= 2 + 0 = 634
00011001000110100    unit = 8192; reverse = 2826 + 0 * 8192 = 2826
                     endpoint.value *= 2 + 0 = 1268
000110010001101000   unit = 16384; reverse = 2826 + 0 * 16384 = 2826
                     endpoint.value *= 2 + 0 = 2536

Here we find that we have another sequence with 3 zeros, so we compare the current reverse value with the end-point's value, and find that the stored endpoint has a lower value:

endpoint.value = 2536  < reverse = 2826

so we keep the end-point set to position 10 and we go on iterating over the string:

0001100100011010001      unit = 32768; reverse = 2826 + 1 * 32768 = 35594
                         endpoint.value *= 2 + 1 = 5073  
00011001000110100010     unit = 65536; reverse = 35594 + 0 * 65536 = 35594
                         endpoint.value *= 2 + 0 = 10146
000110010001101000100    unit = 131072; reverse = 35594 + 0 * 131072 = 35594
                         endpoint.value *= 2 + 0 = 20292
0001100100011010001000   unit = 262144; reverse = 35594 + 0 * 262144 = 35594
                         endpoint.value *= 2 + 0 = 40584

And we find another sequence of 3 zeros, so we compare this position to the stored end-point:

endpoint.value = 40584 > reverse = 35594

and we find it has a smaller value, so we replace the possible end-point with the current position:

endpoint = {position = 21, value = 35594}

And then we iterate over the final digit:

00011001000110100010001   unit = 524288; reverse = 35594 + 1 * 524288 = 559882  
                          endpoint.value *= 2 + 1 = 71189

So at the end we find that position 21 gives us the lowest value, so it is the optimal solution:

00011001000110100010001  ->  00000010001011000100111
   ^                 ^
start = 3         end = 21

Here's a C++ version that uses a vector of bool instead of integers. It can parse strings longer than 64 characters, but the complexity is probably quadratic.

#include <vector>

struct range {unsigned int first; unsigned int last;};

range lexiLeastRev(std::string const &str) {
    unsigned int len = str.length(), first = 0, last = 0, run = 0, max_run = 0;
    std::vector<bool> forward(0), reverse(0);
    bool leading_zeros = true;

    for (unsigned int pos = 0; pos < len; pos++) {
        bool digit = str[pos] - 'a';
        if (!digit) {
            if (leading_zeros) continue;
            if (++run > max_run || run == max_run && reverse < forward) {
                max_run = run;
                last = pos;
                forward = reverse;
            }
        }
        else {
            if (leading_zeros) {
                leading_zeros = false;
                first = pos;
            }
            run = 0;
        }
        forward.push_back(digit);
        reverse.insert(reverse.begin(), digit);
    }
    return range {first, last};
}

edited Sep 24, 2017 at 19:58

answered Sep 15, 2017 at 19:22

m69 ''snarky and unwelcoming''

12.4k3 gold badges36 silver badges61 bronze badges

13 Comments

algrid Over a year ago

How can this be O(n) when the length of numbers you're manipulating in general depends on n?

cryptomanic Over a year ago

I don't understand the logic for the point 3 i.e. comparison of endpoint.value and reverse. Why it will work? Apart from that everything is crystal clear.

m69 ''snarky and unwelcoming'' Over a year ago

@algrid You're probably right that it isn't O(n) in the strictest mathematical sense; let's say it's linear within the integer range of a particular implementation.

m69 ''snarky and unwelcoming'' Over a year ago

@cryptomanic At any point, reverse holds the value of the string as if it were reversed up to the current position, whereas the value of the provisionally optimal endpoint holds the value of the string if it were reversed only up to that endpoint. If reversed is smaller, it means the string reversed up to the current point is lexicographically smaller than the string with only the part up to endpoint reversed, and so the current position becomes the new optimal endpoint.

גלעד ברקן Over a year ago

If the number of bits in each comparison is dependent on n, it would be linear if the number of comparisons was a constant. However, in this case the number of comparisons depends on n. Still it's a great answer.

|

Collectives™ on Stack Overflow

How to find the lexicographically smallest string by reversing a substring?

2 Answers 2

2 Comments

13 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

13 Comments

Your Answer

Sign up or log in

Post as a guest