I want to train a network using Tensorflow, based on features from a time signal. Data is split up in E 3 second epochs with F features for each epoch. Thus, the data has the form
Epoch | Feature 1 | Feature 2 | ... | Feature F |
-------------------------------------------------
1 | .. | .. | | .. |
| .. | .. | | .. |
E | .. | .. | | .. |
Loading data to Tensorflow, I am trying to follow the cifar example and using tf.FixedLengthRecordReader. Thus, I have taken the data, and saved it to a binary file of type float32 with first label for the first epoch, followed by the F features for the first epoch, then the second, etc.
Reading this into Tensorflow is a challenge for me, however. Here is my code:
def read_data_file(file_queue):
class DataRecord(object):
pass
result = DataRecord()
#1 float32 as label => 4 bytes
label_bytes = 4
#NUM_FEATURES as float32 => 4 * NUM_FEATURES
features_bytes = 4 * NUM_FEATURES
#Create the read operator with the summed amount of bytes
reader = tf.FixedLengthRecordReader(record_bytes=label_bytes+features_bytes)
#Perform the operation
result.key, value = reader.read(file_queue)
#Decode the result from bytes to float32
value_bytes = tf.decode_raw(value, tf.float32, little_endian=True)
#Cast label to int for later
result.label = tf.cast(tf.slice(value_bytes, [0], [label_bytes]), tf.int32)
#Cast features to float32
result.features = tf.cast(tf.slice(value_bytes, [label_bytes],
[features_bytes]), tf.float32)
print ('>>>>>>>>>>>>>>>>>>>>>>>>>>>')
print ('%s' % result.label)
print ('%s' % result.features)
print ('>>>>>>>>>>>>>>>>>>>>>>>>>>>')
Print output was:
Tensor("Cast:0", shape=TensorShape([Dimension(4)]), dtype=int32)
Tensor("Slice_1:0", shape=TensorShape([Dimension(40)]), dtype=float32)
Which surprises me, because since I have cast the values to float32, I expected the dimensions to be respectively 1 and 10, which are the actual numbers, but they are 4 and 40, which corresponds to the byte lengths.
How come?