How to convert Float array/list to TFRecord?

Question

This is the code used to convert data to TFRecord

def _int64_feature(value):
    return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))

 def _bytes_feature(value):
   return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))

def _floats_feature(value):
   return tf.train.Feature(float_list=tf.train.FloatList(value=value))

with tf.python_io.TFRecordWriter("train.tfrecords") as writer:
    for row in train_data:
        prices, label, pip = row[0],row[1],row[2]
        prices = np.asarray(prices).astype(np.float32)
        example = tf.train.Example(features=tf.train.Features(feature={
                                           'prices': _floats_feature(prices),
                                           'label': _int64_feature(label[0]),
                                           'pip': _floats_feature(pip)
    }))
        writer.write(example.SerializeToString())

Feature prices is an array of shape(1,288). It converted successfully! But when decoded the data using a parse function and Dataset API.

def parse_func(serialized_data):
    keys_to_features = {'prices': tf.FixedLenFeature([], tf.float32),
                    'label': tf.FixedLenFeature([], tf.int64)}

    parsed_features = tf.parse_single_example(serialized_data, keys_to_features)
    return parsed_features['prices'],tf.one_hot(parsed_features['label'],2)

It gave me the error

C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\framework\op_kernel.cc:1202] OP_REQUIRES failed at example_parsing_ops.cc:240 : Invalid argument: Key: prices. Can't parse serialized Example. 2018-03-31 15:37:11.443073: W C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\framework\op_kernel.cc:1202] OP_REQUIRES failed at example_parsing_ops.cc:240 : Invalid argument: Key: prices. Can't parse serialized Example. 2018-03-31 15:37:11.443313: W C:\tf_jenkins\workspace\rel-win\M\windows-gpu\ raise type(e)(node_def, op, message) PY\36\tensortensorflow.python.framework.errors_impl.InvalidArgumentError: Key: prices. Can't parse serialized Example. [[Node: ParseSingleExample/ParseSingleExample = ParseSingleExample[Tdense=[DT_INT64, DT_FLOAT], dense_keys=["label", "prices"], dense_shapes=[[], []], num_sparse=0, sparse_keys=[], sparse_types=[]](arg0, ParseSingleExample/Const, ParseSingleExample/Const_1)]] [[Node: IteratorGetNext_1 = IteratorGetNextoutput_shapes=[[?], [?,2]], output_types=[DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]fl ow\core\framework\op_kernel.cc:1202] OP_REQUIRES failed at example_parsing_ops.cc:240 : Invalid argument: Key: prices. Can't parse serialized Example.

Itamar Mushkin · Accepted Answer · 2020-03-18 09:03:14Z

10

I found the problem. Instead of using tf.io.FixedLenFeature for parsing an array, use tf.io.FixedLenSequenceFeature
(for TensorFlow 1, use tf. instead of tf.io.)

edited Mar 18, 2020 at 9:03

Itamar Mushkin

2,9232 gold badges19 silver badges34 bronze badges

answered Apr 1, 2018 at 2:47

Thien

2613 silver badges12 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Jon Deaton Over a year ago

Can you elaborate a little bit more on how this actually solved your problem and what you code looks like that works?

Thien Over a year ago

@JonDeaton As I mentioned above, I got the error when using FixedLenFeatures. But it worked when I changed to FixedLenSequenceFeature. Prices is a 1D array. For encoding to tfrecord, I used def _floats_feature(value): return tf.train.Feature(float_list=tf.train.FloatList(value=value)) and for decode : keys_to_features = {'prices': tf.FixedLenSequenceFeature([],dtype=tf.float32,allow_missing=True), 'label': tf.FixedLenFeature([], tf.int64)}

Mike Over a year ago

@SajadNorouzi has an answer below that seems more correct. I've managed to get both methods to work. However, I'm not sure that the documentation is as clear as he states. Possibly it has been edited in the meantime, but it only seems to imply that FixedLenSequenceFeature should only be used for dimension 2 or higher. It may be worthwhile to edit this answer to mention the other answer, or to sort out which of the two methods is truly correct, or whether they both are. Cheers!

user319436 Over a year ago

Had a similar issue due to the fact that my features were stored as lists of tf.strings, tf.float32 or tf.float64 so providing the respective feature description helped, e.g. "your key", tf.io.FixedLenSequenceFeature([], tf.string, allow_missing=True)

Sajad Norouzi · Accepted Answer · 2018-11-06 02:55:40Z

If your feature is a fixed 1-d array then using tf.FixedLenSequenceFeature is not correct at all. As the documentation mentioned, the tf.FixedLenSequenceFeature is for a input data with dimension 2 and higher. In this example you need to flatten your price array to become (288,) and then for decoding part you need to mention the array dimension.

Encode:

example = tf.train.Example(features=tf.train.Features(feature={
                                       'prices': _floats_feature(prices.tolist()),
                                       'label': _int64_feature(label[0]),
                                       'pip': _floats_feature(pip)

Decode:

keys_to_features = {'prices': tf.FixedLenFeature([288], tf.float32),
                'label': tf.FixedLenFeature([], tf.int64)}

bantmen · Accepted Answer · 2018-03-31 15:48:47Z

1

You can't store an n-dimensional array as a float feature as float features are simple lists. You have to flatten prices into a list by doing prices.tolist(). If you need to recover the n-dimensional array from the flattened float feature, then you can do prices = np.reshape(float_feature, original_shape).

answered Mar 31, 2018 at 15:48

bantmen

7687 silver badges17 bronze badges

1 Comment

Thien Over a year ago

It still doesn't work with a flattened list. I still got the error above.

drali · Accepted Answer · 2018-04-30 15:22:37Z

0

I had the same issue while carelessly modifying some scripts, it was caused by slightly different data shape. I had to change the shape to match expected shape, eg (A, B) to (1, A, B). I used np.ravel() for flattening.

answered Apr 30, 2018 at 15:22

drali

3171 silver badge12 bronze badges

Comments

Constantine Elster · Accepted Answer · 2019-02-13 22:33:32Z

Exactly the same thing happens to me with reading float32 data lists from TFrecord files.

I get Can't parse serialized Example when executing sess.run([time_tensor, frequency_tensor, frequency_weight_tensor]) with tf.FixedLenFeature, though tf.FixedLenSequenceFeature seems to be working fine.

My feature format for reading files (the working one) is as follows: feature_format = { 'time': tf.FixedLenSequenceFeature([], tf.float32, allow_missing = True), 'frequencies': tf.FixedLenSequenceFeature([], tf.float32, allow_missing = True), 'frequency_weights': tf.FixedLenSequenceFeature([], tf.float32, allow_missing = True) }

The encoding part is:

feature = { 'time': tf.train.Feature(float_list=tf.train.FloatList(value=[*some single value*]) ), 'frequencies': tf.train.Feature(float_list=tf.train.FloatList(value=*some_list*) ), 'frequency_weights': tf.train.Feature(float_list=tf.train.FloatList(value=*some_list*) ) }

This happens with TensorFlow 1.12 on Debian machine without GPU offloading (i.e. only CPU used with TensorFlow)

Is there any misuse from my side? Or is it a bug in the code or documentation? I can think on contributing/upstreaming any fixes if that would benefit anyone...

Collectives™ on Stack Overflow

How to convert Float array/list to TFRecord?

5 Answers 5

4 Comments

Comments

1 Comment

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

4 Comments

Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related