How to implement next_batch() function for custom data in python

Question

I am currently working on the cats vs dogs classification task on kaggle by implementing a deep convNet. The following lines of code is used for data preprocessing:

def label_img(img):
   word_label = img.split('.')[-3]
   if word_label == 'cat': return [1,0]
   elif word_label == 'dog': return [0,1]

def create_train_data():
   training_data = []
   for img in tqdm(os.listdir(TRAIN_DIR)):
      label = label_img(img)
      path = os.path.join(TRAIN_DIR,img)
      img = cv2.resize(cv2.imread(path,cv2.IMREAD_GRAYSCALE),IMG_SIZE,IMG_SIZE))
      training_data.append([np.array(img),np.array(label)])

   shuffle(training_data)
   return training_data

train_data = create_train_data()

X_train = np.array([i[0] for i in train_data]).reshape(-1, IMG_SIZE,IMG_SIZE,1)
Y_train =np.asarray([i[1] for i in train_data])

I want to implement a function that replicates the following function provided in the tensorflow deep MNIST tutorial

batch = mnist.train.next_batch(100)

Joshua Lim · Accepted Answer · 2017-06-16 05:19:29Z

3

Apart from generating a batch, you may also want to randomly re-arrange data for each batch.

EPOCH = 100
BATCH_SIZE = 128
TRAIN_DATASIZE,_,_,_ = X_train.shape
PERIOD = TRAIN_DATASIZE/BATCH_SIZE #Number of iterations for each epoch

for e in range(EPOCH):
    idxs = numpy.random.permutation(TRAIN_DATASIZE) #shuffled ordering
    X_random = X_train[idxs]
    Y_random = Y_train[idxs]
    for i in range(PERIOD):
        batch_X = X_random[i * BATCH_SIZE:(i+1) * BATCH_SIZE]
        batch_Y = Y_random[i * BATCH_SIZE:(i+1) * BATCH_SIZE]
        sess.run(train,feed_dict = {X: batch_X, Y:batch_Y})

answered Jun 16, 2017 at 5:19

Joshua Lim

3551 gold badge3 silver badges9 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Kaustabh Kakoty Over a year ago

Thank you so much. Finally I can train my network properly.

Kaustabh Kakoty Over a year ago

Can you enlighten me on what the next_batch() of tensorflow returns? Is it a random collection of data from the training set of the specified batch size? If so then does it ensure non repetition? @Joshua Lim

Joshua Lim Over a year ago

next_batch() is a function specifically for the MNIST tutorial provided by tensorflow. How it works is it randomizes the training image and label pairs at the begining, and selects each subsequent 100 images each time the function is called. Once it reaches the end, the image-label pairs are randomized again, and the process is repeated. The entire dataset is only reshuffled and repeated once all the available pairs are used.

Satoshi Kataoka · Accepted Answer · 2017-06-15 23:24:14Z

0

This code is a good example to come up with the function to generate batch.

To explain briefly, you just need to come up with two arrays for x_train and y_train like:

  batch_inputs = np.ndarray(shape=(batch_size), dtype=np.int32)
  batch_labels = np.ndarray(shape=(batch_size, 1), dtype=np.int32)

And set train data like:

  batch_inpouts[i] = ...
  batch_labels[i, 0] = ...

Finally pass the data set to session:

_, loss_val = session.run([optimizer, loss], feed_dict={train_inputs: batch_inputs, train_labels:batch_labels})

answered Jun 15, 2017 at 23:24

Satoshi Kataoka

3161 silver badge1 bronze badge

1 Comment

Kaustabh Kakoty Over a year ago

Will try this out. Thanks for your time.

Collectives™ on Stack Overflow

How to implement next_batch() function for custom data in python

2 Answers 2

3 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related