This question is complex, so please ask questions for explaining more detail on this quesiton. (ps. Im not native English speaker, thats why...)
the input is that sample sequence with length of 34, while the output is the result part
For now, I have a sequence sample that is length of 34, it may be constructed as: ("result" is what I need)
The sample sequence = result part + known sequence (I didnt know the length of result part)
- result (length 34)
- result (length N, N < 34) + known sequence (34 - N)
All those numbers in sequence are random.
Right now, I need to find result part without including the sequence known part.
Some background info:
I have 10 millions this sample sequence with length of 34. (10 millions of knowing 34 digits random number sequence from generator)
After I find the result, I will need to make it to compare on a 5 million length of sequence B, and find if the result sequence is uniquely match on the long sequence somewhere.
My current algrothm is to use a detector which is first 10 digits of known sequence, and remove the sequence after if I detect detecter sequence somewhere in sample sequence. However, there is still a chance that result contains the part of sequence inside of the known sequence. Does anyone has a better algrothm?
Thanks so much! In addition, I'm programming this under python.
ex.
1st condition:
199010104761700150004736290473629657 == sample seq
all are result and known part still the same
input:
199010104761700150004736290473629657
output:
199010104761700150004736290473629657
2nd condition:
199010104728392817111123995561547659 == sample seq
1990101047 == result part
28392817111123995561547659... == known part
input will be: 199010104728392817111123995561547659, 28392817111123995561547659...
output I want is: 1990101047