Good way to feed input data of different sizes into neural network? (Tensorflow)

Question

My data looks like this. They are floats and they are in a big numpy array [700000,3]. There are no empty fields.

Label   | Values1   | Values2
1.      | 0.01      | 0.01
1.      | ...       | ...
1.      |
2.      |
2.      |
3.      |
...

The idea is to feed in the set of values1 and values2 and have it identify the label using classification.

But I don't want to feed the data row by row, but input all values1/2 that belong to label 1 as a set (e.g. inputting the first 3 rows is supposed to return [1,0,...], inputting the next 2 rows as a set [0,1,...])

Is there a non-complex way of feeding the data in this way? (i.e. feed batch where column label equals 1)

I am currently sorting the data and thinking about using pointers to the start and having loops which check if the next row is equal to the current to find a pointer to the end of the set and get the number of rows of that batch. But this more or less prevents randomizing input order.

ml4294 · Accepted Answer · 2017-08-05 15:31:46Z

Since you have your data in a numpy array (let's call it data, you can use

single_digit = data[(data[:,0] == 1.)][: , 1:]

which will compare the zeroth element of each row with the digit (1. in this case) and select only the rows having the label 1.. From these rows, it takes the first and second element, i.e. Values1 and Values2. A working example is below. You can use a for loop to iterate over all labels contained in the data set and construct a numpy array for each label with

single_digit = data[(data[:,0] == label_of_this_iteration)][: , 1:]

and then feed these arrays to the network. Within TensorFlow you can easily feed batches of different length, if you do not specify the first dimension of the corresponding placeholders.

import numpy as np
# Generate some data with three columns (label, Values1, Values2)
n = 20
ints = np.random.randint(1,6,(n, 1))
dous = np.random.uniform(size=(n,2))
data = np.hstack((ints, dous))
print(data)

# Extract the second and third columns of all rows having the label 1.0
ones = data[(data[:,0] == 1.)][: , 1:]
print(ones)

Jan Krynauw · Accepted Answer · 2017-08-06 18:38:27Z

0

Ideally use TFRecords format.

This approach makes it easier to mix and match data sets and network architectures

Here is a link for detail on what this json like structure looks like example.proto

answered Aug 6, 2017 at 18:38

Jan Krynauw

1,10211 silver badges23 bronze badges

Collectives™ on Stack Overflow

Good way to feed input data of different sizes into neural network? (Tensorflow)

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related