This ended up being not such a simple parse. The reason, not only do you have to handle transitions from a sequence of duplicates to a new sequence of duplicates (e.g. "AAABB"), and handle the transition from a sequence of duplicates to a series of characters (e.g. "AAABCDE"), but also handle the transition back (e.g. "AAAHIJKLBBCDE") any number of times, and also handle duplicates through the end (e.g. "AAABCDEGG").
There are a fair number of caveats there. One approach is to handle parsing the string in a continual loop, advancing the index based on the number of duplicates in sequence or the number of characters in series. A basic outline would be:
loop continually over indexes in string {
while (sequence of duplicates) {
extract duplicates substring
advance index by no. of duplicates
}
while (characters in series in sort-order) {
increment counter
advance index
}
if (counter > 2) {
extract series substring
}
Now within each of those blocks you also need to handle end-of-string. With that in mind and presuming you will extract each substring found in std::string s; and store the substring in a std::vector<std::string> vs;, you could do something similar to the following using std::basic_string::find_first_not_of to check for the sequence of duplicates for you:
for (size_t i = 0; ;) { /* loop until string exhausted */
bool dupsadded = false; /* flag for whether duplicates found */
size_t spos = 0, nchr = 0; /* string position and number of chars */
/* loop extracting duplicate characters */
while ((spos = s.find_first_not_of (s[i], i)) &&
/* duplicates do not extend to end of string */
((spos != std::string::npos && spos - i > 1) ||
/* duplicates do extend to end of string */
(s.substr(i).length() > 1 && spos == std::string::npos))) {
if (spos != std::string::npos) { /* handle not through end */
nchr = spos - i; /* no of chars duplicate chars */
vs.push_back (s.substr (i, nchr)); /* add to vector of substrings */
i += nchr; /* incremnt index */
dupsadded = true; /* set dupsadded flag */
nchr = 0; /* zero nchr */
}
else { /* duplicates to end of string */
vs.push_back (s.substr (i)); /* add remaining substring */
goto done;
}
}
if (!i || dupsadded) /* 1st char or dups found */
i += 1; /* advance past last dup as s[i-1] */
while (s[i] && s[i-1] + 1 == s[i]) { /* while characters in sequance */
nchr += 1; /* increment char count */
i += 1; /* increment index */
}
if (nchr > 1) /* if nchr > 1 (3 in sequence) */
vs.push_back (s.substr (i - nchr - 1, nchr + 1));
else if (!s[i]) /* if at end */
break; /* break */
else /* otherwise */
i += 1; /* increment index */
}
done:;
Other than the user of .find_first_not_of(), the remainder of the function just relies on good old arithmetic. There are many different ways to write this, but here if the set of characters were not a series of duplicates, then the ASCII values of adjacent characters were compared to determine if a series of characters in sort order was present. See ASCII Table & Description.
The transition for sequence of duplicates to series in sort-order was particularly problematic as the comparison for sort order relied on comparing s[i-1] + 1 == s[i] which would have compared the last duplicate character if the index was not further adjusted by 1 so that s[i-1] was actually the next character after the sequence of duplicates. (it's not anything magic, it just depends on how you make the comparison of adjacent characters while protecting for end-of-string at the same time). I guess the right way to put it is the arithmetic required special attention to handle that transition.
Putting a short example together, you could do:
#include <iostream>
#include <string>
#include <vector>
int main (void) {
std::string s{}; /* string for user input */
std::vector<std::string> vs{}; /* vector of string to hold substrings */
std::cout << "enter string: ";
if (!(std::cin >> s)) {
std::cout << "(user canceled input)\n";
return 0;
}
if (s.length() < 2) { /* validate at least 2 characters */
std::cerr << "error: must have more than 1 character.\n";
return 1;
}
for (size_t i = 0; ;) { /* loop until string exhausted */
bool dupsadded = false; /* flag for whether duplicates found */
size_t spos = 0, nchr = 0; /* string position and number of chars */
/* loop extracting duplicate characters */
while ((spos = s.find_first_not_of (s[i], i)) &&
/* duplicates do not extend to end of string */
((spos != std::string::npos && spos - i > 1) ||
/* duplicates do extend to end of string */
(s.substr(i).length() > 1 && spos == std::string::npos))) {
if (spos != std::string::npos) { /* handle not through end */
nchr = spos - i; /* no of chars duplicate chars */
vs.push_back (s.substr (i, nchr)); /* add to vector of substrings */
i += nchr; /* incremnt index */
dupsadded = true; /* set dupsadded flag */
nchr = 0; /* zero nchr */
}
else { /* duplicates to end of string */
vs.push_back (s.substr (i)); /* add remaining substring */
goto done;
}
}
if (!i || dupsadded) /* 1st char or dups found */
i += 1; /* advance past last dup as s[i-1] */
while (s[i] && s[i-1] + 1 == s[i]) { /* while characters in sequance */
nchr += 1; /* increment char count */
i += 1; /* increment index */
}
if (nchr > 1) /* if nchr > 1 (3 in sequence) */
vs.push_back (s.substr (i - nchr - 1, nchr + 1));
else if (!s[i]) /* if at end */
break; /* break */
else /* otherwise */
i += 1; /* increment index */
}
done:;
for (const auto &ss : vs) /* output results */
std::cout << ss << '\n';
}
(note: there is a lot there, this isn't a skim over and understand the logic. Take a a pencil a piece of paper and write out the input string and then work through each iteration tracking the value of i (the index) positioning a mark below the current character and noting the values for spos returned from .find_first_not_of(), nchr and noting the value use to extract the characters using .substr(). That's probably the best way to approach understanding what is happening at each point -- similar to a conversation with the duck from How to debug small programs)
Example Use/Output
Your first example:
$ /bin/str_substr_dup_or_seq
enter string: AAABBCDE
AAA
BB
CDE
Your second example:
$ ./bin/str_substr_dup_or_seq
enter string: HHHHZAB
HHHH
An extension testing additional transitions:
$ ./bin/str_substr_dup_or_seq
enter string: AAAHIJKLBBCDE
AAA
HIJKL
BB
CDE
With a sequence of duplicates at the end:
$ ./bin/str_substr_dup_or_seq
enter string: AAABCDEGG
AAA
BCDE
GG
With a sequence of duplicates with the same character as the last in series:
$ ./bin/str_substr_dup_or_seq
enter string: AAABCDEEE
AAA
BCDE
EE
(there is a slight ambiguity whether you want "BCD" and "EEE" instead -- both would satisfy your contstraints. Further implementation to change the behavior is left to you)
Or a bit more of a challenge with "ISHKABIBBLE" (Yiddish for nonsense) inserted within "AAAHIJKLBBCDE" with another trailing "IE" added to the end:
$ ./bin/str_substr_dup_or_seq
enter string: AAAHIJKLISHKABIBBLEBBCDEIE
AAA
HIJKL
BB
BB
CDE
Look things over and let me know if you have further questions.
stemp.substr(0, 3);? Manual: en.cppreference.com/w/cpp/string/basic_string/substrstemp[i-1] == stemp[i] - 1;) Whenever the equality or sequence fails, check the counter, it greater than 1, copy that number of characters to whatever separate storage you will use (e.g.std::vector<std::string>sounds reasonable).