Python Scikit-Learn transformation

Question

I am trying to learn Scikit_Learn and build an ML model. I am learning from "Hands-On Machine Learning with SciKit-Learn, Keras & TensorFlow".

In Chapter 2, there is a review of an entire ML project from the beginning until completion. One of the steps is to transform or prepare the data for the model. There are numerous transformation steps and each output is stored in a new object (mostly Dataframe).

Why is the model built with the original dataset? Wouldn't these transformations be ineffective?

I haven't cited the example from the book as it is too lengthy. If you have the book, please review and respond to my question.

Another similar instance would be from https://machinelearningmastery.com/python-machine-learning-mini-course/

In Lesson 7, the standardized data is stored in "rescaledX" and the model is built from X which is the original dataset. How is the standardized data used in the model that comes afterwards?

D.W. · Accepted Answer · 2024-02-19 23:11:28Z

0

You can build a model from either untransformed data or from transformed data. The only requirement is that, at test time, you must apply the model to the same sort of data.

Which will perform better? It depends on the algorithm, the data, and perhaps other factors. Some algorithms work better with data that has been standardized (e.g., k-nearest neighbors, deep learning). Others don't care. The most reliable way to know which will perform best is to try both.

It's hard to say why a particular exercise does it one way without knowing the specifics of that exercise. Perhaps they tried both ways and one way works better. Perhaps they only tried one way and that worked well enough so they went with it. Perhaps there is a bug. Perhaps it was chosen for pedagogical reasons. "Why?" questions are often hard to answer in modern machine learning. If you want to try to gain more insight yurself, you can try training a model both ways and see if one way performs better or not.

answered Feb 19, 2024 at 23:11

D.W.♦

169k23 gold badges236 silver badges519 bronze badges

$\begingroup$ Thanks! Why are the standardizations shown if the model is using just the raw data? For the online example, "rescaledX" is used nowhere after that. $\endgroup$

EngineerP
– EngineerP

2024-02-20 07:16:13 +00:00
Commented Feb 20, 2024 at 7:16
$\begingroup$ @EngineerP, I can't answer that with the information available in the question. $\endgroup$

D.W.
– D.W. ♦

2024-02-20 07:16:51 +00:00
Commented Feb 20, 2024 at 7:16

Add a comment |

Stack Exchange Network

Python Scikit-Learn transformation

1 Answer 1

Your Answer

Hot Network Questions

Python Scikit-Learn transformation

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions