0

I am currently working on the cats vs dogs classification task on kaggle by implementing a deep convNet. The following lines of code is used for data preprocessing:

def label_img(img):
   word_label = img.split('.')[-3]
   if word_label == 'cat': return [1,0]
   elif word_label == 'dog': return [0,1]

def create_train_data():
   training_data = []
   for img in tqdm(os.listdir(TRAIN_DIR)):
      label = label_img(img)
      path = os.path.join(TRAIN_DIR,img)
      img = cv2.resize(cv2.imread(path,cv2.IMREAD_GRAYSCALE),IMG_SIZE,IMG_SIZE))
      training_data.append([np.array(img),np.array(label)])

   shuffle(training_data)
   return training_data

train_data = create_train_data()

X_train = np.array([i[0] for i in train_data]).reshape(-1, IMG_SIZE,IMG_SIZE,1)
Y_train =np.asarray([i[1] for i in train_data])

I want to implement a function that replicates the following function provided in the tensorflow deep MNIST tutorial

batch = mnist.train.next_batch(100)
0

2 Answers 2

3

Apart from generating a batch, you may also want to randomly re-arrange data for each batch.

EPOCH = 100
BATCH_SIZE = 128
TRAIN_DATASIZE,_,_,_ = X_train.shape
PERIOD = TRAIN_DATASIZE/BATCH_SIZE #Number of iterations for each epoch

for e in range(EPOCH):
    idxs = numpy.random.permutation(TRAIN_DATASIZE) #shuffled ordering
    X_random = X_train[idxs]
    Y_random = Y_train[idxs]
    for i in range(PERIOD):
        batch_X = X_random[i * BATCH_SIZE:(i+1) * BATCH_SIZE]
        batch_Y = Y_random[i * BATCH_SIZE:(i+1) * BATCH_SIZE]
        sess.run(train,feed_dict = {X: batch_X, Y:batch_Y})
Sign up to request clarification or add additional context in comments.

3 Comments

Thank you so much. Finally I can train my network properly.
Can you enlighten me on what the next_batch() of tensorflow returns? Is it a random collection of data from the training set of the specified batch size? If so then does it ensure non repetition? @Joshua Lim
next_batch() is a function specifically for the MNIST tutorial provided by tensorflow. How it works is it randomizes the training image and label pairs at the begining, and selects each subsequent 100 images each time the function is called. Once it reaches the end, the image-label pairs are randomized again, and the process is repeated. The entire dataset is only reshuffled and repeated once all the available pairs are used.
0

This code is a good example to come up with the function to generate batch.

To explain briefly, you just need to come up with two arrays for x_train and y_train like:

  batch_inputs = np.ndarray(shape=(batch_size), dtype=np.int32)
  batch_labels = np.ndarray(shape=(batch_size, 1), dtype=np.int32)

And set train data like:

  batch_inpouts[i] = ...
  batch_labels[i, 0] = ...

Finally pass the data set to session:

_, loss_val = session.run([optimizer, loss], feed_dict={train_inputs: batch_inputs, train_labels:batch_labels})

1 Comment

Will try this out. Thanks for your time.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.