0

I am using a recursive function to generate text using a RegEx match, where it finds a pattern of words according to a combination of synonyms inside square brackets (pattern = '\[.*?\]') separated by a string separator (I have defined a SEPARATOR =#lkmkmksdmf###. )

The initial sentence argument to the function is something like:

[decreasing#lkmkmksdmf###shrinking#lkmkmksdmf###falling#lkmkmksdmf###contracting#lkmkmksdmf###faltering#lkmkmksdmf###the contraction in] exports of services will drive national economy to a 0.3% real GDP [decline#lkmkmksdmf###decrease#lkmkmksdmf###contraction] in 2023 from an estimated 5.0% [decline#lkmkmksdmf###decrease#lkmkmksdmf###contraction] in 2022

and

The function reads like:

def combinations(self,sentence,master_sentence_list:list):
        pattern = '\[.*?\]'

        if not re.findall(pattern, sentence, flags = re.IGNORECASE):
            if sentence not in sentence_list:
                sentence_list.append(sentence)
        else:
            for regex_match in re.finditer(pattern, sentence, flags = re.IGNORECASE):
                repl=regex_match.group(0)[1:-1]
                start_span = regex_match.span()[0]
                end_span = regex_match.span()[1]
                for word in repl.split(self.SEPARATOR):
                    tmpsentence = (
                        sentence[0: start_span] +
                        word +
                        sentence[end_span:]
                    )
                    new_sentence = deepcopy(tmpsentence)
                    self.combinations(new_sentence,master_sentence_list)

Thus, the master_sentence_list variable keeps appending the sentences like a DFS tree

I want to avoid using the same words twice - for example if I used the word "decline" then it should not be used again while choosing the next set of words in the inner for loop after the recursive call. Is there a way of "storing" the word used by the words inside the first square bracket when a word from the second square brackets pattern is parsed and so on?

*It is like a DFS tree where each node has to store the state of each of its parent node. * How can I modify the function to not use the same words again in a single sentence of the sentence_list?

I tried using an argument called "avoid_words: list" to which would store the list of the parent node words. But how do I erase it when I have to move over to the next word in from the first square bracket (or starts from a different "root")?

1
  • This is just not a regular expression problem. You're not looking for patterns, you're looking for exact matches on full words. Just split the text into words using .split(), and go through the list one by one. This is just an abuse of regexes. Commented Jul 27, 2023 at 19:30

1 Answer 1

3

As Tim pointed out, if there really is no other way to input the string and it's arguments (which I doubt) you should use split() function to separate initial sentence into words (synonyms) and pure sentence.

Bellow is the commented code I would use, have I had to solve a situation like this.

def all_combinations(sentence) -> list:
    pattern = r'\[(.*?)\]'
    synonyms = []
    resulting_sentences = []

#Put all of the synonyms into synonyms list
    list_of_synonyms = re.findall(pattern, sentence, flags = re.IGNORECASE)
#Remove synonyms from the origingal sentence
    sentence = re.sub(pattern, '[]', sentence)


#split sinynonyms into dictionaries containing tuple and clock
    for i, x in enumerate(list_of_synonyms):
            synonyms.append(tuple(x.split('#lkmkmksdmf###')))
 
#Create combinations and put those into list of sets. 
# Sets can hold only unique elements, thus in case of duplicity thwy will be shorter.
# The set will be removed if it's length is <3.
    synonym_combinations = list(set(combinations) for combinations in itertools.product(*synonyms) if len(set(combinations)) == 3)

#iterate over combinations
    for combination in synonym_combinations:
#iterate over words in combinations
        formatted_sentence = sentence
        for synonym in combination: formatted_sentence = formatted_sentence.replace('[]',synonym,1)
#append formatted sentence to resulting senteces
        resulting_sentences.append(formatted_sentence)
    return resulting_sentences
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.