0

I have a numpy array that contains 813698 rows:

len(df_numpy)
Out[55]: 813698

I want to loop through this array using mini batches of 5000.

mini_batch = 5000
i = 0
for each batch in df_numpy:
   mysubset = df_numpy[i:mini_batch+i]
   # …
   i = i + mini_batch

The problem is that (len(df_numpy)-1)/mini_batch is not an integer. So, the last mini batch is not equal to 5000.

How can I loop though df_numpy so that all records of df_numpy are included?

3
  • Well what do you want to do with the missing values? Fill with zeros, fill with random values from the original array, or just drop the ones that don't fit in a last batch? Commented Feb 27, 2020 at 11:46
  • @NilsWerner: I want to get all mini batches + smaller last mini batch into mysubset in the for loop. Then I perform some operations on each mysubset. Commented Feb 27, 2020 at 11:47
  • 1
    Slicing beyond the end of the array is legal in Python and will create the last "smaller mini batch" Commented Feb 27, 2020 at 11:54

1 Answer 1

2

This code should get the job done:

mini_batch = 5000
for first in range(0, len(df_numpy), mini_batch):
    mysubset = df_numpy[first:first+mini_batch]
    # ...

Demo

In [2]: import numpy as np

In [3]: df_numpy = np.arange(13)

In [4]: mini_batch = 5

In [5]: for first in range(0, len(df_numpy), mini_batch):
   ...:    mysubset = df_numpy[first:first+mini_batch]
   ...:    print(mysubset)
[0 1 2 3 4]
[5 6 7 8 9]
[10 11 12]
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks. In this case what happens with the last batch that is smaller than mini_batch?
Added a small demo to make it clear what happens with the last mini batch

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.