1

I'm new to PyTorch; trying to implement a model I developed in TF and compare the results. The model is an Autoencoder model. The input data is a csv file including n samples each with m features (a n*m numerical matrix in a csv file). The targets (the labels) are in another csv file with the same format as the input file. I've been looking online but couldn't find a good documentation for reading non-image data from csv file with multiple labels. Any idea how can I read my data and iterate over it during training?

Thank you

2
  • why don't you use pandas to load the dataset and then pytorch-related classes to frame it inside the tensors? Commented May 7, 2020 at 19:08
  • 1
    Thanks for your comment! I'm looking for something similar to tf.data.experimental.make_csv_dataset in TF. So I can shuffle the data and stream the data without needing to manually create batches of data. Commented May 7, 2020 at 19:14

1 Answer 1

3

Might you be looking for something like TabularDataset?

class torchtext.data.TabularDataset(path, format, fields, skip_header=False, csv_reader_params={}, **kwargs)

Defines a Dataset of columns stored in CSV, TSV, or JSON format.

It will take a path to a CSV file and build a dataset from it. You also need to specify the names of the columns which will then become the data fields.

In general, all of implementations of torch.Dataset for specific types of data are located outside of pytorch in the torchvision, torchtext, and torchaudio libraries.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.