If the approach is only for two lists, the following approach should work. I am not sure if this is the most efficient solution though.
A nice description of find n-grams is given in this blog post.
This approach provides the min length and determines the max length that a repeating sequence of a list might have (at most half the length of the list).
We then find all the sequences for each of the lists by combining the sequences for individual lists. Then we have a counter of every sequence and its count.
Finally we return a dictionary of all the sequences that occur more than once.
def find_repeating(list_a, list_b):
min_len = 2
def find_ngrams(input_list, n):
return zip(*[input_list[i:] for i in range(n)])
seq_list_a = []
for seq_len in range(min_len, len(list_a) + 1):
seq_list_a += [val for val in find_ngrams(list_a, seq_len)]
seq_list_b = []
for seq_len in range(min_len, len(list_b) + 1):
seq_list_b += [val for val in find_ngrams(list_b, seq_len)]
all_sequences = seq_list_a + seq_list_b
counter = {}
for seq in all_sequences:
counter[seq] = counter.get(seq, 0) + 1
filtered_counter = {k: v for k, v in counter.items() if v > 1}
return filtered_counter
Do let me know if you are unsure about anything.
>>> list_a = ['a', 'b', 'c', 'd', 'e', 'f', 'e', 'f']
>>> list_b = ['a', 'b', 'd', 'f', 'e', 'f']
>>> print find_repeating(list_a, list_b)
{('f', 'e'): 2, ('e', 'f'): 3, ('f', 'e', 'f'): 2, ('a', 'b'): 2}