-1

I'm trying to use gradient descent on a data set. What I have written is

import numpy
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

data = pd.read_csv('C:/Users/Teacher/Downloads/data.csv')
X = data.iloc[:, 0]  # selects all data from first column in data
Y = data.iloc[:, 1]
plt.scatter(X,Y)
plt.show()
n = len(X)

a = 0
b = 0
L = .001

for i in range(1000):
    y_predicted = a * X + b
    pd_a = (1 / n) * sum((y_predicted - Y) * X)
    pd_b = (1 / n) * sum(y_predicted - Y)
    a = a - L * pd_a
    b = b - L * pd_b
print(a, b)
plt.scatter(X, Y)
c, d = numpy.polyfit(X, Y, 1)
print(c, d)
plt.plot([min(X), max(X)], [a * x + b for x in [min(X), max(X)]], [c * x + d for x in [min(X), max(X)]])
plt.show()

If I instead define X and Y = np.random.rand(20), then everything seems to work fine, so I the issue appears to be with the iput from csv. However, the scatterplot for X and Y is still fine, even when I define them as the first and second column of my data set, so I'm not sure what's going on.

Edit: Here is an image of the scatterplot after defining X = data.iloc[:, 0] Y = data.iloc[:, 1]

enter image description here

Here is an image of the plot and line at the end of the code.

enter image description here

The result of print(data.head()):

enter image description here

Edit: reading just one line of the csv:

enter image description here

enter image description here

1
  • I've added two images if it clarifies things. Clearly, the gradient descent algorithm does not produce the proper line. I am not sure why this is happening. Commented Mar 6, 2024 at 15:12

1 Answer 1

1

Since I don't have the csv, I would do the below to troubleshoot why reading from the csv does not work;

Assumption: there are 2 rows per line in csv, so we create X and Y as lists using the below for loop

data = pd.read_csv('C:/Users/Teacher/Downloads/data.csv')
X, Y = [], []
for i in data:
  X.append(i.split()[0])
  Y.append(i.split()[1])
Sign up to request clarification or add additional context in comments.

9 Comments

I get an Index error: list index out of range, for the Y.append line.
can you share the results of data.head() and len(data) ? I want to see how this looks and the size
Just added as the third image. The result of len(data) is 99.
@user124910 thanks for sharing. Please print i before the append and share. know the structure and type of data elements can help knw exactly how to split and append this
@user124910 can we focus on resolving the source of X and Y ? You know looking at the data.head() I can see your x, y values are not highly correlated. So you are not expecting a perfect fit for your output. Until we get the input right lets not focus much on the line. Looking forward to the structure and type of 1 as you loop over the dataframe
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.