2

I am trying to covert a pandas data frame into a pytorch tensor in order to run a LSTM model, but I keep getting the following error stating that there is value error and unable to determine the shape of the object type 'series'. It then refers to the following code:

class MicroESDataset(Dataset):

    def __init__(self, sequences):
        self.sequences = sequences

    def __len__(self):
        return len(self.sequences)

    def __getitem__(self, idx):
        sequence, label = self.sequences[idx]
        return dict (
            sequence=torch.Tensor(sequence.to_numpy()),
            label = torch.tensor(label).float ()
        )

Am I missing something completely obvious? Thanks

Here is the exact error message and traceback:

    ValueError                                Traceback (most recent       call last)
    <ipython-input-46-fb5c7eb803e1> in <module>()
----> 1 for item in data_module.train_dataloader():
  2   print(item["sequence"].shape)
  3   print(item["label"].shape)
  4   # print(item["label"])
  5   break

    3 frames
/usr/local/lib/python3.7/dist-packages/torch/_utils.py in reraise(self)
427             # have message field
428             raise self.exc_type(message=msg)
--> 429         raise self.exc_type(msg)
  430 
  431 

  ValueError: Caught ValueError in DataLoader worker process 0.
 Original Traceback (most recent call last):
 File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop
data = fetcher.fetch(index)
 File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "<ipython-input-30-36c44aae196d>", line 13, in __getitem__
label = torch.tensor(label).float()

ValueError: could not determine the shape of object type 'Series'

5
  • 1
    Please, provide the exact error message and the complete traceback. Commented Jun 1, 2021 at 20:10
  • 1
    I added the exact error message and traceback in the OP. Commented Jun 1, 2021 at 20:33
  • Looks like label is a Series object and Tensorflow doesn't know what to do with that. Commented Jun 1, 2021 at 20:36
  • Does this answer your question? Convert Pandas dataframe to PyTorch tensor? Commented Jun 1, 2021 at 20:58
  • Please debug with dataloaders num_workers=0 argument passed. Commented Jun 1, 2021 at 23:36

1 Answer 1

0

2 columns

First of all, idx in Dataset should refer to row inside pd.DataFrame.

Method to get row from it is df.iloc[idx] instead of [idx] (which would get the column specified by the index, which is probably not you want, if it is you should transpose your data).

Given that, we can do this (dummy pd.DataFrame with only 2 columns, see code comments):

import pandas as pd
import torch


class MicroESDataset(torch.utils.data.Dataset):
    def __init__(self):
        # Dummy sequences dataframe
        self.sequences = pd.DataFrame({"col1": [1, 2], "col2": [3, 4]})

    def __len__(self):
        return len(self.sequences)

    def __getitem__(self, idx):
        sequence, label = self.sequences.iloc[idx]
        return dict(
            # torch.tensor infers dtype, torch.Tensor is always float
            sequence=torch.tensor(sequence),
            label=torch.tensor(label).float(),
        )


dataset = MicroESDataset()
print(dataset[0])

More columns

If you have more columns (assuming as series probably refers to multiple values) you have to:

  • get the row first
  • slice by appropriate columns

Given the above one could do (in this case 4 columns, one last being label, see code comments):

class MicroESDataset(torch.utils.data.Dataset):
    def __init__(self):
        # Dummy sequences dataframe
        self.sequences = pd.DataFrame(
            {"col1": [1, 2], "col2": [3, 4], "col3": [5, 6], "col4": [7, 8]}
        )

    def __len__(self):
        return len(self.sequences)

    def __getitem__(self, idx):
        # No magic unpacking here!
        row = self.sequences.iloc[idx]
        # Now only columns are left and we can slice with the indices
        # One could also slice using : "col3", but I think this is better in ur case
        sequence, label = row.iloc[:-1], row.iloc[-1]
        return dict(
            sequence=torch.tensor(sequence),
            label=torch.tensor(label).float(),
        )


dataset = MicroESDataset()
print(dataset[0])
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.