1

I received this error when trying to compile my code. I extracted data from xlsx file and created a dataframe ,replaced null values with 0, converted all the values to sting to be able to scatterplot and when i tried to show the results of my linear regression I received this error.

 TypeError: unsupported operand type(s) for /: 'str' and 'int'

and this is the code I did so far

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
def predict(x):
return slope * x + intercept
from scipy import stats
xlsxfile = pd.ExcelFile("C:\\Users\\AchourAh\\Desktop\\PL14_IPC_03_09_2018_SP_Level.xlsx") 
data = xlsxfile.parse('Sheet1', index_col = None, header = None) 
data1 = data.fillna(0) #Replace null values of the whole dataset with 0
data1 = data1.astype(str)
print(data1)
X = data1.iloc[0:len(data1),1] 
print(X)
Y = data1.iloc[0:len(data1),2] 
print(Y)
axes = plt.axes()
axes.grid() 
plt.scatter(X,Y)     
slope, intercept, r_value, p_value, std_err = stats.linregress(X, Y)

To notice that I am a beginner with this. The last line is causing the error This is the first columns COP COR and PAUS of the dataframe which I am trying to apply some Linear regression on:

 0            PP   SP000045856 COP COR  SP000045856 PAUS   
 1          201723                    0              2000   
 2          201724                12560             40060   
 3          201725               -17760             15040   
 4          201726                -5840             16960   
 5          201727                10600             4480   
 6          201728                    0             14700   
 7          201729                 4760             46820  

... till line 27

11
  • Hello, welcome to SO. Why converted all the values to string to be able to scatterplot?? Why do you think you need strings to be able to create a scatter plot? However - stats.linregress needs array(s) of numbers, because it calculates the linear regression of e.g. some measured, i.e. e.g. noisy data, over some independant variable, e.g. time or whatever... Perhaps you have a short look at the documentation of this function: docs.scipy.org/doc/scipy/reference/generated/… Commented Sep 25, 2018 at 7:59
  • And just to be complete - my recommendation is: just do not cast your data to string for that what you want to achieve. The fact, that scatter works and throws no error with two string arrays as arguments doesn't necessarily mean, that the result is useful or meaningful for you. Commented Sep 25, 2018 at 8:05
  • if I remove the astype line I receive this: TypeError: 0 is not a string in the scatter plot line that's why I converted to string do you have any idea how to scatter plot without receiving this error Commented Sep 25, 2018 at 8:09
  • Please post your dataframe as a sample, so that we can see the data you're dealing with. (No screenshot please, post it like code, and if it's too large perhaps just data.head()) Commented Sep 25, 2018 at 8:11
  • I could reproduce your error by providing a list with both strings and ints to plt.scatter. Is it possible, that you perhaps have header names in the first row of your data...? Commented Sep 25, 2018 at 8:18

1 Answer 1

1

The data in your Excel file has header information in the first row, so setting header=None is the reason why there are string values in your data instead of putting it as column names.
If you delete the header kwarg

xlsxfile = pd.ExcelFile("C:\\Users\\AchourAh\\Desktop\\PL14_IPC_03_09_2018_SP_Level.xlsx") 
data = xlsxfile.parse('Sheet1', index_col = None)

everything should work and you should get a dataframe like this:

data

   0      PP  SP000045856 COP COR  SP000045856 PAUS
0  1  201723                    0              2000
1  2  201724                12560             40060
2  3  201725               -17760             15040
3  4  201726                -5840             16960
4  5  201727                10600              4480
5  6  201728                    0             14700
6  7  201729                 4760             46820

However, you could do the same thing even a little shorter by directly using the read_excel-function of pandas:

data = pd.read_excel('C:\\Users\\AchourAh\\Desktop\\PL14_IPC_03_09_2018_SP_Level.xlsx', 'Sheet1')

Your scatter-plot can then be done e.g. like

data.plot('SP000045856 COP COR', 'SP000045856 PAUS', 'scatter')

or perhaps better readable but identical:

data.plot.scatter('SP000045856 COP COR', 'SP000045856 PAUS')

And the linear regression could be done like

slope, intercept, r_value, p_value, std_err = stats.linregress(data['SP000045856 COP COR'], data['SP000045856 PAUS'])
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.