I'm trying to conduct the residual analysis for simple linear regression. I need to prove that the residuals follow an approximate Normal Distribution.
The csv file I'm using has values for Percentage of marks in Grade 10 and the Salary the student makes.
Once I run the below code, my plot looks like this:

The plot in the book looks like this:

I was expecting my plot to show up like the book as the data is the same. I have double-checked to make sure I'm not missing any data etc. I have split the data set into training and test as per the book as well.
Data is as follows:
Percentage Salary
62 270000
76.33 200000
72 240000
60 250000
61 180000
55 300000
70 260000
68 235000
82.8 425000
59 240000
58 250000
60 180000
66 428000
83 450000
68 300000
37.33 240000
79 252000
68.4 280000
70 231000
59 224000
63 120000
50 260000
69 300000
52 120000
49 120000
64.6 250000
50 180000
74 218000
58 360000
67 150000
75 250000
60 200000
55 300000
78 330000
50.08 265000
56 340000
68 177600
52 236000
54 265000
52 200000
76 393000
64.8 360000
74.4 300000
74.5 250000
73.5 360000
57.58 180000
68 180000
69 270000
66 240000
60.8 300000
The code is below:
# Importing all required libraries for building the regression model
import pandas as pd import numpy as np
import statsmodels.api as sm
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
# Load the dataset into dataframe
mba_salary_df = pd.read_csv( 'MBA Salary.csv' )
# Add constant term of 1 to the dataset
X = sm.add_constant( mba_salary_df[‘Percentage in Grade 10’] )
Y = mba_salary_df['Salary']
# Split dataset into train and test set into 80:20 respectively
train_X, test_X, train_y, test_y = train_test_split( X, Y, train_size = 0.8,random_state = 100 )
# Fit the regression model
mba_salary_lm = sm.OLS( train_y, train_X ).fit()
mba_salary_resid = mba_salary_lm.resid
probplot = sm.ProbPlot(mba_salary_resid)
plt.figure( figsize = (8, 6) )
probplot.ppplot(line='45')
plt.title("Normal P-P Plot of Regression Standardized Residuals")
plt.show()

print()(andprint(type(...)),print(len(...)), etc.) to see which part of code is executed and what you really have in variables. It is called"print debugging"and it helps to see what code is really doing.importon the same line, nor does it use fancy quotes instead of regular'ones, and probably have aYvariable defined somewhere, and aXvariable with a column name that matches one of the column names of the data)