115

How do I train a simple neural network with PyTorch on a pandas dataframe df?

The column df["Target"] is the target (e.g. labels) of the network. This doesn't work:

import pandas as pd
import torch.utils.data as data_utils

target = pd.DataFrame(df['Target'])
train = data_utils.TensorDataset(df, target)
train_loader = data_utils.DataLoader(train, batch_size=10, shuffle=True)
2
  • 2
    Welcome to StackOverflow! Please read about how to ask a question (particularly how to create a good example) in order to get good responses. Commented May 16, 2018 at 4:01
  • 1
    Issue: your features (df) also contains the target variable (df['Target']) (i.e. your network is 'cheating', since it can see the results as input) Commented Mar 10, 2021 at 15:09

7 Answers 7

103

I'm referring to the question in the title as you haven't really specified anything else in the text, so just converting the DataFrame into a PyTorch tensor.

Without information about your data, I'm just taking float values as example targets here.

Convert Pandas dataframe to PyTorch tensor?

import pandas as pd
import torch
import random

# creating dummy targets (float values)
targets_data = [random.random() for i in range(10)]

# creating DataFrame from targets_data
targets_df = pd.DataFrame(data=targets_data)
targets_df.columns = ['targets']

# creating tensor from targets_df 
torch_tensor = torch.tensor(targets_df['targets'].values)

# printing out result
print(torch_tensor)

Output:

tensor([ 0.5827,  0.5881,  0.1543,  0.6815,  0.9400,  0.8683,  0.4289,
         0.5940,  0.6438,  0.7514], dtype=torch.float64)

Tested with Pytorch 0.4.0.

I hope this helps, if you have any further questions - just ask. :)

Sign up to request clarification or add additional context in comments.

7 Comments

Using your code i wrote this: train_target = torch.tensor(train['Target'].values) train = torch.tensor(train.drop('Target', axis = 1).values) train_tensor = data_utils.TensorDataset(train, train_target) train_loader = data_utils.DataLoader(dataset = train_tensor, batch_size = batch_size, shuffle = True) Running the neural net model i get this error: RuntimeError: Expected object of type torch.FloaTtensor but found type torch.DoubleTensor for argument #4 'mat1'
What PyTorch version version are you using? Version 0.3.1. is very different from version 0.4.0. .
How does your DataFrame look like? Best would be to update your question, otherwise it gonna be difficult to reproduce your problem.
Just for the records, on terminology: you are not converting a pandas DataFrame, rather a Pandas series (which you first coerce to array applying .values).
Tensors are multidimensional (otherwise we call them vectors and matrices). Can you please show construction of a 3D torch tensor from a column (series) of a DataFrame?
|
33

Maybe try this to see if it can fix your problem(based on your sample code)?

train_target = torch.tensor(train['Target'].values.astype(np.float32))
train = torch.tensor(train.drop('Target', axis = 1).values.astype(np.float32)) 
train_tensor = data_utils.TensorDataset(train, train_target) 
train_loader = data_utils.DataLoader(dataset = train_tensor, batch_size = batch_size, shuffle = True)

Comments

14

You can use below functions to convert any dataframe or pandas series to a pytorch tensor

import pandas as pd
import torch

# determine the supported device
def get_device():
    if torch.cuda.is_available():
        device = torch.device('cuda:0')
    else:
        device = torch.device('cpu') # don't have GPU 
    return device

# convert a df to tensor to be used in pytorch
def df_to_tensor(df):
    device = get_device()
    return torch.from_numpy(df.values).float().to(device)

df_tensor = df_to_tensor(df)
series_tensor = df_to_tensor(series)

2 Comments

Hello, I tried your code but I'm receiving the following error name 'series' is not defined.
@Luis: This is the pandas series you want to convert. Replace it with yours.
12

You can pass the df.values attribute (a numpy array) to the Dataset constructor directly:

import torch.utils.data as data_utils

# Creating np arrays
target = df['Target'].values
features = df.drop('Target', axis=1).values

# Passing to DataLoader
train = data_utils.TensorDataset(features, target)
train_loader = data_utils.DataLoader(train, batch_size=10, shuffle=True)

Note: Your features (df) also contains the target variable (df['Target']) i.e. your network is 'cheating', since it can see the targets in the input. You need to remove this column from the set of features.

Comments

7

Simply convert the pandas dataframe -> numpy array -> pytorch tensor. An example of this is described below:

import pandas as pd
import numpy as np
import torch

df = pd.read_csv('train.csv')
target = pd.DataFrame(df['target'])
del df['target']
train = data_utils.TensorDataset(torch.Tensor(np.array(df)), torch.Tensor(np.array(target)))
train_loader = data_utils.DataLoader(train, batch_size = 10, shuffle = True)

Hopefully, this will help you to create your own datasets using pytorch (Compatible with the latest version of pytorch).

Comments

6
#This works for me

target = torch.tensor(df['Targets'].values)
features = torch.tensor(df.drop('Targets', axis = 1).values)

train = data_utils.TensorDataset(features, target)
train_loader = data_utils.DataLoader(train, batch_size=10, shuffle=True)

2 Comments

hello and thanks for your contribution ! any difference with the answer from @iacob ?
Hi, I had to convert target and features to torch.tensor first.
3

To convert dataframe to pytorch tensor: [you can use this to tackle any df to convert it into pytorch tensor]

steps:

  • convert df to numpy using df.to_numpy() or df.to_numpy().astype(np.float32) to change the datatype of each numpy array to float32
  • convert the numpy to tensor using torch.from_numpy(df) method

example:

tensor_ = torch.from_numpy(df.to_numpy().astype(np.float32))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.