I am using a recursive function to generate text using a RegEx match, where it finds a pattern of words according to a combination of synonyms inside square brackets (pattern = '\[.*?\]') separated by a string separator (I have defined a SEPARATOR =#lkmkmksdmf###. )
The initial sentence argument to the function is something like:
[decreasing#lkmkmksdmf###shrinking#lkmkmksdmf###falling#lkmkmksdmf###contracting#lkmkmksdmf###faltering#lkmkmksdmf###the contraction in] exports of services will drive national economy to a 0.3% real GDP [decline#lkmkmksdmf###decrease#lkmkmksdmf###contraction] in 2023 from an estimated 5.0% [decline#lkmkmksdmf###decrease#lkmkmksdmf###contraction] in 2022
and
The function reads like:
def combinations(self,sentence,master_sentence_list:list):
pattern = '\[.*?\]'
if not re.findall(pattern, sentence, flags = re.IGNORECASE):
if sentence not in sentence_list:
sentence_list.append(sentence)
else:
for regex_match in re.finditer(pattern, sentence, flags = re.IGNORECASE):
repl=regex_match.group(0)[1:-1]
start_span = regex_match.span()[0]
end_span = regex_match.span()[1]
for word in repl.split(self.SEPARATOR):
tmpsentence = (
sentence[0: start_span] +
word +
sentence[end_span:]
)
new_sentence = deepcopy(tmpsentence)
self.combinations(new_sentence,master_sentence_list)
Thus, the master_sentence_list variable keeps appending the sentences like a DFS tree
I want to avoid using the same words twice - for example if I used the word "decline" then it should not be used again while choosing the next set of words in the inner for loop after the recursive call. Is there a way of "storing" the word used by the words inside the first square bracket when a word from the second square brackets pattern is parsed and so on?
*It is like a DFS tree where each node has to store the state of each of its parent node. * How can I modify the function to not use the same words again in a single sentence of the sentence_list?
I tried using an argument called "avoid_words: list" to which would store the list of the parent node words. But how do I erase it when I have to move over to the next word in from the first square bracket (or starts from a different "root")?
.split(), and go through the list one by one. This is just an abuse of regexes.