Best way to implement moving window in python for loop?

Question

I have a list of words(tokens) through which I iterate. I want to perform a certain transformation on moving windows of that list. The size of the windows size can be of variable length.

for i in range(0,len(tokens)-(window_size+1),step):
    doc2vec.model.infer_vector(tokens[i:i+window_size])

The for loop goes through the length of the tokens at a step defined in the variable, it takes as many token as the variable window_size says. The problem I see is in the last iteration. The iteration ends at the the length of the tokens - the windows size(+1 so that the substracted value is included). Let's say the window size is 10 and the step is 5 and the length of tokens is 98. In such a situation my code would do the last calculation at 85:95 and leave out the last three elements. I want to a solution that would work for variable window_size, step and tokens length. To illustrate, as of now it would work fine if the length of tokens is 95, but if it is 98 three elements would be left. I would want them to be calculated together 88:98.

but should there be a superposition on the last window different from the step? in your example the last batch is 85:95, do you want to make an additional 88:98 batch overriding the current step? — tatarana
– tatarana, Commented Oct 6, 2020 at 11:13
Yes I want the window 85:95 processed and then the window 88:98. — Borut Flis
– Borut Flis, Commented Oct 6, 2020 at 18:58

tatarana · Accepted Answer · 2020-10-12 00:49:17Z

1

I think the way to go is creating your own custom iterator:

class MovingWindow:
    def __init__(self, tokens, window_size, step):
        self.current = -step
        self.last = len(tokens) - window_size + 1
        self.remaining = (len(tokens) - window_size) % step
        self.tokens = tokens
        self.window_size = window_size
        self.step = step

    def __iter__(self):
        return self

    def __next__(self):
        self.current += self.step
        if self.current < self.last:
            return self.tokens[self.current : self.current + self.window_size]
        elif self.remaining:
            self.remaining = 0
            return self.tokens[-self.window_size:]
        else:
            raise StopIteration

witch you will access with:

for t in MovingWindow(tokens, 10, 5):
    doc2vec.model.infer_vector(t)

you could also modify the iterator so it return the indexes instead of the tokens. And another option is to create a simple generator, more information here

to illustrate the case example you provided:

indexes = [i for i in range(98)]
for i in MovingWindow(indexes, 10, 5):
    print(f'{i[0]}:{i[-1]}')

output:

edited Oct 12, 2020 at 0:49

answered Oct 6, 2020 at 20:16

tatarana

3082 silver badges8 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Borut Flis Over a year ago

Thank you, self.remaining = (len(tokens) + step) % window_size I think is a confused way to calculate the leftover words. len(tokens) - window_size gives the actual number, however your way does not lead to a faulty result.

tatarana Over a year ago

Hi Borut. I guess you meant "len(tokens) % window_size" right? I've tried this at first but it leads to an error when len(tokens) = 95. As you see it will get the left over of /10 witch is 5 but you will get a duplicate list on the end since the step matches perfectly. I've rerun my tests and actually I've made a mistake, the correct way is "(len(tokens) + window_size) % step". Please let me know if you find a better way to simplify and thanks for the response.

Borut Flis Over a year ago

Yes, (len(tokens)+window_size) % step is the correct one or (len(tokens)+window_size) % step it is the same thing. What type of tests did you use?

Borut Flis Over a year ago

I mean (len(tokens)-window_size) % step is the same.

Borut Flis Over a year ago

Actually (len(tokens)-window_size) % step appears correct and (len(tokens)+ window_size) % step is not. I found counter example len 82, window 7 step 5. if you use first formula you get 0 remaining and if you use second you get 4.

|

Collectives™ on Stack Overflow

Best way to implement moving window in python for loop?

1 Answer 1

6 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related