So I have a dataset in the TFRecords format, and I am trying to convert reading the dataset with tf.python_io.tf_record_iterator to tf.data.TFRecordDataset.
Outside of tf.python_io.tf_record_iterator being deprecated, the main reason for doing this is that I would like to be able to use tf.data.Dataset objects.
Within the TFRecords file, each entry is a SequenceExample, specifically tensorflow.core.example.example_pb2.SequenceExample.
Currently I am reading out each SequenceExample via this function:
def read_records(record_path):
records = []
record_iterator = tf.python_io.tf_record_iterator(path=record_path)
for string_record in record_iterator:
example = tf.train.SequenceExample()
example.ParseFromString(string_record)
records.append(example)
return records
Printing out a record gives me this kind of structure (truncated due to length):
context {
feature {
key: "framecount"
value {
int64_list {
value: 10
}
}
}
feature {
key: "label"
value {
int64_list {
value: 1
}
}
}
}
feature_lists {
feature_list {
key: "positions"
value {
feature {
bytes_list {
value: "\221\2206?\200dL?\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
}
}
}
}
}
Now if I attempt to do this with tf.data.TFRecordDataset, my function is:
def reader(file_path):
dataset = tf.data.TFRecordDataset(file_path)
for record in dataset:
tf.io.parse_sequence_example(record)
return dataset
I am given a value error, stating that I have not supplied value or context features. Which is true, because the record has said values. (I have additionally attempted to follow the same flow for the first function with training a new SequenceExample, though it seems the data TFRecordDataset outputs is different from the old record iterator).
Given this, how would I properly generate my sequenceExample? Though I could technically give it parameters to work with, this seems counter intuitive especially since the data is already in the record.
Alternatively, (though this would be more of a band-aid fix) how could I convert the list in the first function into a tensorflow dataset object?