Print supersequence

Question

I'm struggling with understanding a concept of minimum supersequence of two given strings.

I understand how to print this supersequence from 2D dp table that we get after solving LCS problem for two given strings. I do not understand how we can prove that output we get is indeed minimum supersequence.

We need to prove that

It's minimal - I guess it's obvious because we print all distinct elements of two strings and print elements of LCS only once, it's okay
It's a supersequence - I do not understand why the output result we get mantains the correct order of supersequence

Can someone clarify it, please?

Xellos · Accepted Answer · 2024-11-29 19:29:31Z

Since you don't specify the exact algorithm other than mentioning that it's DP table, I'm going to start from the beginning. Consider a procedure to check if S is a subsequence of T:

check(S, T):
    state = 0
    if length(S) == 0: return true
    for char c in T:
        if S[state] == c: state++
        if length(S) == state: return true
    return false

The internal state here is the length of the longest prefix of S that can be found as a subsequence of the currently processed prefix of T. It should be obvious why this is correct.

With 2 strings (S1, S2), we have pairs of states (state1, state2) and dynamic programming becomes: what is the shortest possible prefix of T such that the procedure above ends up with these state values when we run it independently on these 2 strings? Compared to listing all possible strings with a given length and calculating this pair of states for each of them, the DP only uses two optimisation ideas:

to find dp[state1][state2] from (prev_state1, prev_state2) for any previous state by adding any character, we only need any shortest prefix of T that gives (prev_state1, prev_state2), which is dp[prev_state1][prev_state2], because (look at check(S, T)) the next state doesn't depend on previous part of prefix of T, only on the previous state and the character being added
the only next characters c that are worth trying are S1[state1] and S2[state2], since anything else will give the same state with greater length

At the end, dp[length(S1)][length(S2)] gives the shortest prefix i.e. shortest superstring T for which both checks (Sx is a subsequence of T) say true. All that the DP transitions do is efficient simulation of these state-update loops for both strings.

Collectives™ on Stack Overflow

Print supersequence

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related