Preparing CSV file for neural network machine learning Python

Question

I'm taking a course about machine learning in my undergrad studies and I have a problem where I don't know to load a CSV file into Dataloader then test it, can someone guide me through the process?

you can download the CSV files from this link if you wish https://ufile.io/f/abdd9

Here is the code

import tensorflow as tf
from torch.utils.data import DataLoader
import numpy as np
import pandas as pd
import torch
import torchvision
import matplotlib.pyplot as plt
from time import time
from torchvision import datasets, transforms
from torch import nn, optim

train_data1 = pd.read_csv(("C:/Users/HP/OneDrive/سطح المكتب/KFUPM/TERM 212/EE485/Exp3/mnist_train.csv")
test_data1 = pd.read_csv("C:/Users/HP/OneDrive/سطح المكتب/KFUPM/TERM 212/EE485/Exp3/mnist_test.csv")
dtype = torch.float32
torch_tensor1 = torch.tensor(train_data1.values,dtype = dtype)
torch_tensor2 = torch.tensor(test_data1.values,dtype = dtype )
trainloader=DataLoader(torch_tensor1, batch_size=64, shuffle=True)
testloader =DataLoader(torch_tensor2, batch_size=64, shuffle=True)

then when i try to run this line of code i get an error

dataiter = iter(trainloader)
images, labels = dataiter.next()

print(images.shape)
print(labels.shape)

which is

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-234-afd0555e962e> in <module>
      1 dataiter = iter(trainloader)
----> 2 images, labels = dataiter.next()
      3 
      4 print(images.shape)
      5 print(labels.shape)

ValueError: too many values to unpack (expected 2)

You are loading all the data in a tensor without splitting the images and the labels. So when you iterate on your trainloader, it returns you a unique tensor where you are expecting 2 values. You need to split labels and images from the DataFrame. — Ssayan
– Ssayan, Commented Feb 23, 2022 at 10:16
but if I separate them how would I get a batch of 64 with the that corresponds to the same label and image if Dataloader takes one tensor only? — Majed M.Alharthi
– Majed M.Alharthi, Commented Feb 23, 2022 at 10:29
To use a Dataloader, you need to create a Dataset see here. This is a good exercise, however, for MNIST it is not worth it as you can put the whole dataset in RAM so you can directly use numpy arrays as input data. Moreover, as you can see in the tutorial, it is very likely that MNIST dataset already exists in Pytorch so you can pull it directly (the example uses Fashion MNIST but it would be the same) — Ssayan
– Ssayan, Commented Feb 23, 2022 at 10:36

Ssayan · Accepted Answer · 2022-02-23 14:30:01Z

To do it properly with a Dataset and a Dataloader, you need to create a custom dataset:

import pandas as pd
from torch.utils.data import Dataset

class CustomMnistDataset(Dataset):
    def __init__(self, csv_file):
        data = pd.read_csv(csv_file)
        self.labels = np.array(data["label"])
        self.images = np.array(data.iloc[:, 1:])

    def __len__(self):
        return len(self.labels)

    def __getitem__(self, idx):
        return self.images[idx], self.labels[idx]

Then use it to create your dataloader:

from torch.utils.data import DataLoader

test_dataset = CustomMnistDataset("mnist_test.csv")
test_dataloader = DataLoader(test_dataset, batch_size=64, shuffle=True)
image_batch, label_batch = next(iter(test_dataloader))

This way you get a batch of 64 in the right Pytorch Tensor format for your training. As I said in my comment, for MNIST it is an overkill as you can load it directly from Pytorch. You may need to flatten it though.

from torchvision import datasets

training_data = datasets.MNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor()
)

EDIT: If you want to use the dataset already provided in Pytorch in a flatten way, you can to this. Then the custom dataset is maybe simpler afterall.

from torchvision import datasets
from torchvision.transforms import ToTensor
import matplotlib.pyplot as plt

training_data = datasets.MNIST(
    root="data",
    train=True,
    download=True,
    transform=lambda x: torch.Tensor(np.array(x).reshape(len(np.array(x))**2))
) 

train_dataloader = DataLoader(training_data, batch_size=64, shuffle=True)

Collectives™ on Stack Overflow

Preparing CSV file for neural network machine learning Python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related