Problem with gradient descent least squares code [closed]

Question

Closed. This question needs details or clarity. It is not currently accepting answers.

Want to improve this question? As written, this question is lacking some of the information it needs to be answered. If the author adds details in comments, consider editing them into the question. Once there's sufficient detail to answer, vote to reopen the question.

Closed last year.

Improve this question

I'm trying to use gradient descent on a data set. What I have written is

import numpy
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

data = pd.read_csv('C:/Users/Teacher/Downloads/data.csv')
X = data.iloc[:, 0]  # selects all data from first column in data
Y = data.iloc[:, 1]
plt.scatter(X,Y)
plt.show()
n = len(X)

a = 0
b = 0
L = .001

for i in range(1000):
    y_predicted = a * X + b
    pd_a = (1 / n) * sum((y_predicted - Y) * X)
    pd_b = (1 / n) * sum(y_predicted - Y)
    a = a - L * pd_a
    b = b - L * pd_b
print(a, b)
plt.scatter(X, Y)
c, d = numpy.polyfit(X, Y, 1)
print(c, d)
plt.plot([min(X), max(X)], [a * x + b for x in [min(X), max(X)]], [c * x + d for x in [min(X), max(X)]])
plt.show()

If I instead define X and Y = np.random.rand(20), then everything seems to work fine, so I the issue appears to be with the iput from csv. However, the scatterplot for X and Y is still fine, even when I define them as the first and second column of my data set, so I'm not sure what's going on.

Edit: Here is an image of the scatterplot after defining X = data.iloc[:, 0] Y = data.iloc[:, 1]

Here is an image of the plot and line at the end of the code.

The result of print(data.head()):

Edit: reading just one line of the csv:

I've added two images if it clarifies things. Clearly, the gradient descent algorithm does not produce the proper line. I am not sure why this is happening. — user124910
– user124910, Commented Mar 6, 2024 at 15:12

user23463397 · Accepted Answer · 2024-03-06 08:12:34Z

1

Since I don't have the csv, I would do the below to troubleshoot why reading from the csv does not work;

Assumption: there are 2 rows per line in csv, so we create X and Y as lists using the below for loop

data = pd.read_csv('C:/Users/Teacher/Downloads/data.csv')
X, Y = [], []
for i in data:
  X.append(i.split()[0])
  Y.append(i.split()[1])

answered Mar 6, 2024 at 8:12

user23463397

577 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

user124910 Over a year ago

I get an Index error: list index out of range, for the Y.append line.

user23463397 Over a year ago

can you share the results of data.head() and len(data) ? I want to see how this looks and the size

user124910 Over a year ago

Just added as the third image. The result of len(data) is 99.

user23463397 Over a year ago

@user124910 thanks for sharing. Please print i before the append and share. know the structure and type of data elements can help knw exactly how to split and append this

user23463397 Over a year ago

@user124910 can we focus on resolving the source of X and Y ? You know looking at the data.head() I can see your x, y values are not highly correlated. So you are not expecting a perfect fit for your output. Until we get the input right lets not focus much on the line. Looking forward to the structure and type of 1 as you loop over the dataframe

|

Collectives™ on Stack Overflow

Problem with gradient descent least squares code [closed]

1 Answer 1

9 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

9 Comments

Related