0

I'm taking a course about machine learning in my undergrad studies and I have a problem where I don't know to load a CSV file into Dataloader then test it, can someone guide me through the process?

you can download the CSV files from this link if you wish https://ufile.io/f/abdd9

Here is the code

import tensorflow as tf
from torch.utils.data import DataLoader
import numpy as np
import pandas as pd
import torch
import torchvision
import matplotlib.pyplot as plt
from time import time
from torchvision import datasets, transforms
from torch import nn, optim

train_data1 = pd.read_csv(("C:/Users/HP/OneDrive/سطح المكتب/KFUPM/TERM 212/EE485/Exp3/mnist_train.csv")
test_data1 = pd.read_csv("C:/Users/HP/OneDrive/سطح المكتب/KFUPM/TERM 212/EE485/Exp3/mnist_test.csv")
dtype = torch.float32
torch_tensor1 = torch.tensor(train_data1.values,dtype = dtype)
torch_tensor2 = torch.tensor(test_data1.values,dtype = dtype )
trainloader=DataLoader(torch_tensor1, batch_size=64, shuffle=True)
testloader =DataLoader(torch_tensor2, batch_size=64, shuffle=True)

then when i try to run this line of code i get an error

dataiter = iter(trainloader)
images, labels = dataiter.next()

print(images.shape)
print(labels.shape)

which is

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-234-afd0555e962e> in <module>
      1 dataiter = iter(trainloader)
----> 2 images, labels = dataiter.next()
      3 
      4 print(images.shape)
      5 print(labels.shape)

ValueError: too many values to unpack (expected 2)
6
  • Provide a sample of mnist_train.csv and mnist_test.csv. Commented Feb 23, 2022 at 8:42
  • I have added a link to download the CSV file Commented Feb 23, 2022 at 10:12
  • You are loading all the data in a tensor without splitting the images and the labels. So when you iterate on your trainloader, it returns you a unique tensor where you are expecting 2 values. You need to split labels and images from the DataFrame. Commented Feb 23, 2022 at 10:16
  • but if I separate them how would I get a batch of 64 with the that corresponds to the same label and image if Dataloader takes one tensor only? Commented Feb 23, 2022 at 10:29
  • To use a Dataloader, you need to create a Dataset see here. This is a good exercise, however, for MNIST it is not worth it as you can put the whole dataset in RAM so you can directly use numpy arrays as input data. Moreover, as you can see in the tutorial, it is very likely that MNIST dataset already exists in Pytorch so you can pull it directly (the example uses Fashion MNIST but it would be the same) Commented Feb 23, 2022 at 10:36

1 Answer 1

1

To do it properly with a Dataset and a Dataloader, you need to create a custom dataset:

import pandas as pd
from torch.utils.data import Dataset

class CustomMnistDataset(Dataset):
    def __init__(self, csv_file):
        data = pd.read_csv(csv_file)
        self.labels = np.array(data["label"])
        self.images = np.array(data.iloc[:, 1:])

    def __len__(self):
        return len(self.labels)

    def __getitem__(self, idx):
        return self.images[idx], self.labels[idx]

Then use it to create your dataloader:

from torch.utils.data import DataLoader

test_dataset = CustomMnistDataset("mnist_test.csv")
test_dataloader = DataLoader(test_dataset, batch_size=64, shuffle=True)
image_batch, label_batch = next(iter(test_dataloader))

This way you get a batch of 64 in the right Pytorch Tensor format for your training. As I said in my comment, for MNIST it is an overkill as you can load it directly from Pytorch. You may need to flatten it though.

from torchvision import datasets

training_data = datasets.MNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor()
)

EDIT: If you want to use the dataset already provided in Pytorch in a flatten way, you can to this. Then the custom dataset is maybe simpler afterall.

from torchvision import datasets
from torchvision.transforms import ToTensor
import matplotlib.pyplot as plt

training_data = datasets.MNIST(
    root="data",
    train=True,
    download=True,
    transform=lambda x: torch.Tensor(np.array(x).reshape(len(np.array(x))**2))
) 

train_dataloader = DataLoader(training_data, batch_size=64, shuffle=True)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.