0

My data looks like this. They are floats and they are in a big numpy array [700000,3]. There are no empty fields.

Label   | Values1   | Values2
1.      | 0.01      | 0.01
1.      | ...       | ...
1.      |
2.      |
2.      |
3.      |
...

The idea is to feed in the set of values1 and values2 and have it identify the label using classification.

But I don't want to feed the data row by row, but input all values1/2 that belong to label 1 as a set (e.g. inputting the first 3 rows is supposed to return [1,0,...], inputting the next 2 rows as a set [0,1,...])

Is there a non-complex way of feeding the data in this way? (i.e. feed batch where column label equals 1)

I am currently sorting the data and thinking about using pointers to the start and having loops which check if the next row is equal to the current to find a pointer to the end of the set and get the number of rows of that batch. But this more or less prevents randomizing input order.

2 Answers 2

1

Since you have your data in a numpy array (let's call it data, you can use

single_digit = data[(data[:,0] == 1.)][: , 1:]

which will compare the zeroth element of each row with the digit (1. in this case) and select only the rows having the label 1.. From these rows, it takes the first and second element, i.e. Values1 and Values2. A working example is below. You can use a for loop to iterate over all labels contained in the data set and construct a numpy array for each label with

single_digit = data[(data[:,0] == label_of_this_iteration)][: , 1:]

and then feed these arrays to the network. Within TensorFlow you can easily feed batches of different length, if you do not specify the first dimension of the corresponding placeholders.

import numpy as np
# Generate some data with three columns (label, Values1, Values2)
n = 20
ints = np.random.randint(1,6,(n, 1))
dous = np.random.uniform(size=(n,2))
data = np.hstack((ints, dous))
print(data)

# Extract the second and third columns of all rows having the label 1.0
ones = data[(data[:,0] == 1.)][: , 1:]
print(ones)
Sign up to request clarification or add additional context in comments.

Comments

0

Ideally use TFRecords format.

This approach makes it easier to mix and match data sets and network architectures

Here is a link for detail on what this json like structure looks like example.proto

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.