11

I'm trying to train a custom dataset through tensorflow object detection api. Dataset contains 40k training images and labels which are in numpy ndarray format (uint8). training dataset shape=2 ([40000,23456]) and labels shape = 1 ([0..., 3]). I want to generate tfrecord for this dataset. how do I do that?

1

1 Answer 1

6

This tutorial will walk you through the process of creating TFRecords from your data:

https://medium.com/mostly-ai/tensorflow-records-what-they-are-and-how-to-use-them-c46bc4bbb564

However there are easier ways of dealing with preprocessing now using the Dataset input pipeline. I prefer to keep my data in it's most original format and build a preprocessing pipeline to deal with it. Here's the primary guide you want to read to learn about the Dataset preprocessing pipeline:

https://www.tensorflow.org/programmers_guide/datasets

Sign up to request clarification or add additional context in comments.

6 Comments

Reading the link, it's clear that TensorFlow wants you to load all your data into memory first (as a dataset). The link doesn't describe any way to load data any other way. Other documentation just says, 'whatever, go make a TFRecordDataset'
I recommend following the second link, usign the Dataset pipeline. You will most certainly not be loading your entire dataset into memory. The amount of data loaded at one time will be governed by commands such as batched_dataset = dataset.batch(4), see the section on Simple Batching. If you are providing a loader function then you'll start with a set of IDs (maybe load all the IDs) and you'll use Dataset.map to take an ID and return the actual data sample it refers to. If your data is already in a TF record format then TF will provide readers for you that load on demand.
the top link has rotten.
So am I supposed to manually add every single column myself (400+) ?
@David Parks After some tries, it turned out that TFRecords is still the only option if your data is not csv or images. If you build your dataset by py_function you will likely still encounter memory issue. Also py_function suffers GIL.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.