Skip to main content
Filter by
Sorted by
Tagged with
0 votes
0 answers
53 views

I have a dataframe like this, df col1 col2 1 'abc,pqr' 2 'ghv' 3 'mrr, jig' Now I want to create a new line for each comma separated values in col2, so the output would look ...
Kallol's user avatar
  • 2,189
0 votes
1 answer
118 views

I am currently utilizing the XGBoost classifier within a pipeline that includes normalization and the XGBoost model itself. The model has been successfully developed in the Notebook environment. The ...
MMM's user avatar
  • 11
0 votes
1 answer
49 views

Here is the full error: `--------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[33], line 2 ...
nicklaus-slade's user avatar
11 votes
2 answers
122k views

I am using sklearn-pandas.DataFrameMapper to preprocess my data. I don't want to impute for a specific column. I just want to drop the row if this column is Null. Is there a way to do that?
topcan5's user avatar
  • 1,707
1 vote
2 answers
90 views

I have fitted a ElasticNetCV in Python with three splits: import numpy as np from sklearn.linear_model import LinearRegression #Sample data: num_samples = 100 # Number of samples num_features = 1000 ...
Joe94's user avatar
  • 405
2 votes
3 answers
111 views

I am trying to train KMeans model using Scikit-Learn. I am stuck on this issue for 2 days. Pandas is selecting all columns of a dataframe even though I specified 2 columns. Here is the dataframe in ...
Shree_ML's user avatar
0 votes
0 answers
27 views

so since KNeighborsClassifier class in sklearn find the best algorithm depending on the values from fit method when using auto (which is the default), when accessing the algorithm using ._fit_method I ...
aisha kh's user avatar
1 vote
2 answers
68 views

I receive a correlation file from an external source. It is a fairly straightforward file and looks like the following. A sample csv can be found here https://www.dropbox.com/scl/fi/...
Stumbling Through Data Science's user avatar
0 votes
2 answers
106 views

I created a set of random missing values to practice with a tree imputer. However, I'm stuck on how to overwrite the missing values into the my dataframe. My missing values look like this: from ...
Englishman Bob's user avatar
0 votes
1 answer
238 views

when calculating classes_weight with from sklearn.utils import class_weight class_weights = class_weight.compute_class_weight(class_weight="balanced", classes=np.unique(...
oliver6626's user avatar
0 votes
1 answer
45 views

model.fit(x_train, y_train, epochs=1000) i'm trying to make a ai but mine code gives a error and i don't how to fix it? this is the error ValueError: Data cardinality is ambiguous: x sizes: 455 y ...
user24242174's user avatar
0 votes
1 answer
235 views

I am running mlflow with autologging to track an xgboost model. By default, under artifacts it saves the model, requirements, and feature importances. Cool stuff I want to keep. But, if I try to add ...
illan's user avatar
  • 385
1 vote
1 answer
76 views

I Have Sample House Price Data And Simple Code : import pandas as pd from sklearn.preprocessing import LabelEncoder, StandardScaler from sklearn.model_selection import train_test_split from sklearn....
mehran arbabian's user avatar
0 votes
1 answer
118 views

I want to use multiple PMMLs to keep the transformation of the data and the application of the model separate. Here is the code I am using. I am doing this because I want to include some kind of ...
Habenzu's user avatar
  • 87
1 vote
1 answer
222 views

I want to find the immediate neighbours around a given point in a multidimensional space (up to 7 dimensions). Important facts about the space: non-linear spacing among points within a single ...
skm's user avatar
  • 5,777
0 votes
1 answer
135 views

I have a PMMLPipeline with the following DataFrameMapper inside (Domains are coming from sklearn2pmml, while the Mapper is from sklearn-pandas): {'features': [(['A'], [ContinuousDomain(dtype=<...
Habenzu's user avatar
  • 87
0 votes
0 answers
74 views

OUTPUT DISPLAY- This is the output of my program project and only thing is remaining is that RangeIndex and dtypes values which I can't able to remoe from output display SOURCE CODE- I are working on ...
Somansh Bhayani's user avatar
0 votes
1 answer
62 views

I am attempting to write a custom classifier for use in a sklearn gridsearchCV pipeline. I've stripped everything back to the bare minimum in the class which currently looks like this: from sklearn....
Ben's user avatar
  • 451
0 votes
1 answer
74 views

I have a dataset df: category var 1 ... var 32 weighting country 1 blue 1.0 54.2 3.0 US 2 pink 0.0 101.0 1.0 other 3 blue 1.0 49.9 3.0 US 4 green 1.0 72.2 9.0 US I'm using the kNN classifier (on ...
MC Jong's user avatar
  • 63
1 vote
1 answer
49 views

Even though I have successfully installed sklearn via Jupyter, I cannot access its classes. What mistake did I make? !pip install sklearn import sklean from sklearn.preprocessing import LabelEncoder ...
Fernando's user avatar
0 votes
1 answer
95 views

I built a model using XGBoost algorithm to predict precipitations. It turns out that the RMSE is equal to 7.6. Does it mean that the model performs poorly? If so, what would be your piece of advice to ...
Willy Mbenza's user avatar
1 vote
1 answer
84 views

In the sklearn installation guide for the latest version (1.3.1) it mentions that you can install dependencies with pip, but says "When using pip, please ensure that binary wheels are used, and ...
Arran Duff's user avatar
  • 1,514
0 votes
0 answers
153 views

I am trying to implement my own distance metrics specifically Jaro distance in Sklearn Nearest Neighbour and I am getting back some errors. I've tried looking up online and didn't manage to find a ...
Gabriel Choo's user avatar
1 vote
2 answers
84 views

I am working on a project tracking poverty across the US between 1995 and 2020. As I am working on a linear regression scatterplot, I have this: # Create a regression object. regression = ...
EC Cotterman's user avatar
1 vote
0 answers
240 views

I am trying to initialize a InterClusterDistance visualizer from the yellowbrick library. When I execute the following: from sklearn.datasets import make_blobs from sklearn.cluster import KMeans from ...
Data guy's user avatar
1 vote
0 answers
195 views

Short version: I need help applying someone else's groupby class on multiple pandas columns and with more complicated functions. Long version: Someone else (Elizabeth Santorella) wrote a python class ...
Inder Jalli's user avatar
0 votes
1 answer
145 views

I found weird behavior of sklearn.preprocessing.MinMaxScaler and same for sklearn.preprocessing.RobustScaler When data max value is very small < 10^(-16) transformer doesn't change data max value ...
Dima's user avatar
  • 47
0 votes
1 answer
2k views

Hello there ) I'm working on an undergraduate data analysis project and would seek guidance in regard to the following case study: What I'm working with: I have a data frame consisting of 3'891 ...
Dan_San's user avatar
  • 21
0 votes
1 answer
53 views

My goal is to normalize my data (minimization & Maximization) for machine learning purposes. The issue I am having is presented when you run the code below. #importation of libraries: import ...
Toy L's user avatar
  • 55
0 votes
1 answer
34 views

Following is the df from io import StringIO import pandas as pd df = pd.read_csv(StringIO(""" Group Date Rank A 01-01-2023 1 A 01-02-2023 2 A 01-03-2023 3 A 01-04-2023 2 A 01-05-2023 1 ...
Yogesh Kamboj's user avatar
-1 votes
2 answers
136 views

New to ML here and trying my hands on fitting a model using Random Forest. Here is my simplified code: X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.15, ...
Sinha's user avatar
  • 466
0 votes
1 answer
501 views

picture of csv file containing raw data I am trying to plot a scatter graph using an online csv file i downloaded in inorder to get the linear regression. %matplotlib inline plt.scatter(df.year, df....
Jamilu's user avatar
  • 5
0 votes
1 answer
118 views

So I am currently creating a machine learning model in Python which predicts the outcome of a football match. Below is the code from the training of the model: features = ['Home Team',..., '...
Andreas Wong's user avatar
0 votes
1 answer
368 views

I wrote a simple generic XGBoost classifier code that runs with a pipeline. This is the code (with simple config example): import optuna import pickle import pandas as pd from xgboost import ...
Zag Gol's user avatar
  • 1,106
0 votes
1 answer
93 views

I'm making a kmeans model with the data from Twitter, but when I apply the polarity and subjectivity analysis on the scatterplot, the centroids (red x) appear far from the data: from sklearn....
Nana's user avatar
  • 13
-1 votes
1 answer
241 views

I am trying to fit a sklearn linear regression model with many points from a pandas dataframe. this is the program: features =["floors", "waterfront", "lat", "...
Legofan35664's user avatar
-1 votes
1 answer
927 views

I am trying to use RandomizedSearchCV from sklearn on an MLPRegressor model, and I have scaled the data using standardScaler. The code for the model is presented below. When I try to run the code I ...
user17637519's user avatar
0 votes
0 answers
83 views

I have the following code, I'm just trying to teach myself how to use a machine learning model. import ast import csv import pandas as pd import numpy as np from sklearn.tree import ...
Malelizarazo's user avatar
0 votes
1 answer
242 views

First, I am wondering if is there a way to find the RMSE value with the y-test value(I can do it if I have a y-test value). For instance, we have train data and test data. But in the test data, we don'...
ash1's user avatar
  • 483
-1 votes
1 answer
362 views

This is the final cleaned DataFrame (df2) before Standardizing my code: scaler=StandardScaler() df2[list(df2.columns)]=scaler.fit_transform(df2[list(df2.columns)]) df2 This returns a DataFrame after ...
SAJEER AR's user avatar
0 votes
1 answer
337 views

When I convert data from a pandas dataframe to sklearn so I can make predictions. String data becomes problematic. So I used labelencoder but it seems to limit me to using the encoded data instead of ...
M.Namjoo's user avatar
2 votes
1 answer
98 views

I have a dataframe that looks like Race_ID Date Student_ID a b 1 1/1/2023 1 3 1 1 1/1/2023 2 2 2 1 1/1/...
Ishigami's user avatar
  • 592
0 votes
1 answer
152 views

This my python code where I am try to predict the fruit count on tree using sklearn but ran into issue code is given below: import cv2 from sklearn.ensemble import RandomForestClassifier def ...
Sneha Somwanshi's user avatar
2 votes
1 answer
98 views

I'm creating a classification model to predict the outcome of sports event(win/loss) and am running into a data setup conundrum. Currently the data is setup as follows: example_data = [team_a_feat_1, ...
Sentient AI Turing's user avatar
0 votes
0 answers
87 views

I have this pandas dataframe ModelCFL = pd.DataFrame(columns=['data_case', 'length']) I fill this Dataframe with my csv files: cases = ['0', '250'] for i in range(0, len(cases)): cases_data = ...
Felipe Quintero Suárez's user avatar
-1 votes
1 answer
177 views

My task was to implement model parameter tuning using stochastic gradient descent. Below is my function implementation code. However, I would like to add any regularization. def gradient(X, y, w, ...
villerpa's user avatar
0 votes
1 answer
75 views

I am trying to create a confusion matrix with probabilities. y_pred_train = logistic.predict_proba(X_train) confusion_matrix(y_train, y_pred_train) ValueError: Classification metrics can't handle a ...
CodeMaster's user avatar
0 votes
0 answers
360 views

I'm not quite sure why I am finding this issue. I have installed sklearn in my environment and imported it within the code. Specifically, I am using sklearn.metrics.r2_score and I actually don't call ...
Magic Dave's user avatar
0 votes
0 answers
223 views

I try to do NN text classification using keras. But when I do confusion matrix it get 0 in TN and FN. It strange cause when I make model using sklearn MLPClassifier with same data and hyperparameter I ...
andryan86's user avatar
0 votes
1 answer
350 views

I want to use FunctionTransformer to perform calculations between columns. For instance, I want to substract two columns and add the the new column to the dataset. So I have the function: def diff(x, ...
Slevin_42's user avatar

1
2 3 4 5
27