1,313 questions
0
votes
0
answers
53
views
Create a new line for comma separated values in pandas column - I dont want to add new rows, I want to have same rows in output [duplicate]
I have a dataframe like this,
df
col1 col2
1 'abc,pqr'
2 'ghv'
3 'mrr, jig'
Now I want to create a new line for each comma separated values in col2, so the output would look ...
0
votes
1
answer
118
views
Timestamp issue while creating the model using pipeline in Vertex AI
I am currently utilizing the XGBoost classifier within a pipeline that includes normalization and the XGBoost model itself. The model has been successfully developed in the Notebook environment.
The ...
0
votes
1
answer
49
views
Cross-Validation Function returns "Unknown label type: (array([0.0, 1.0], dtype=object),)"
Here is the full error:
`---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[33], line 2
...
11
votes
2
answers
122k
views
How to use DataFrameMapper to delete rows with a null value in a specific column?
I am using sklearn-pandas.DataFrameMapper to preprocess my data. I don't want to impute for a specific column. I just want to drop the row if this column is Null. Is there a way to do that?
1
vote
2
answers
90
views
ElasticNetCV in Python: Get full grid of hyperparameters with corresponding MSE?
I have fitted a ElasticNetCV in Python with three splits:
import numpy as np
from sklearn.linear_model import LinearRegression
#Sample data:
num_samples = 100 # Number of samples
num_features = 1000 ...
2
votes
3
answers
111
views
Pandas takes all columns of a dataframe even when some columns are specified
I am trying to train KMeans model using Scikit-Learn.
I am stuck on this issue for 2 days.
Pandas is selecting all columns of a dataframe even though I specified 2 columns.
Here is the dataframe in ...
0
votes
0
answers
27
views
_fit_method for KNN gives KD-tree even though I'm working in a high dimensional spce
so since KNeighborsClassifier class in sklearn find the best algorithm depending on the values from fit method when using auto (which is the default), when accessing the algorithm using ._fit_method I ...
1
vote
2
answers
68
views
Using SKLearn KMeans With Externally Generated Correlation Matrix
I receive a correlation file from an external source. It is a fairly straightforward file and looks like the following.
A sample csv can be found here
https://www.dropbox.com/scl/fi/...
0
votes
2
answers
106
views
Using a Mask to Insert Values from sklearn Iterative Imputer
I created a set of random missing values to practice with a tree imputer. However, I'm stuck on how to overwrite the missing values into the my dataframe. My missing values look like this:
from ...
0
votes
1
answer
238
views
model.fit() class weights do not work when training the model
when calculating classes_weight with
from sklearn.utils import class_weight
class_weights = class_weight.compute_class_weight(class_weight="balanced",
classes=np.unique(...
0
votes
1
answer
45
views
Data cardinality is ambiguous sklearn.train
model.fit(x_train, y_train, epochs=1000)
i'm trying to make a ai but mine code gives a error and i don't how to fix it?
this is the error
ValueError: Data cardinality is ambiguous:
x sizes: 455
y ...
0
votes
1
answer
235
views
Mlflow log_figure deletes artifact
I am running mlflow with autologging to track an xgboost model. By default, under artifacts it saves the model, requirements, and feature importances. Cool stuff I want to keep.
But, if I try to add ...
1
vote
1
answer
76
views
multiple linear regression house price r2 score problem
I Have Sample House Price Data And Simple Code :
import pandas as pd
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.model_selection import train_test_split
from sklearn....
0
votes
1
answer
118
views
How to transform Dataframe Mapper to PMML?
I want to use multiple PMMLs to keep the transformation of the data and the application of the model separate. Here is the code I am using. I am doing this because I want to include some kind of ...
1
vote
1
answer
222
views
How to get immediate neighbors using a kd-tree irrespective of the spacing?
I want to find the immediate neighbours around a given point in a multidimensional space (up to 7 dimensions).
Important facts about the space:
non-linear spacing among points within a single ...
0
votes
1
answer
135
views
DataFrameMapper with sklearn2pmml Domains
I have a PMMLPipeline with the following DataFrameMapper inside (Domains are coming from sklearn2pmml, while the Mapper is from sklearn-pandas):
{'features': [(['A'],
[ContinuousDomain(dtype=<...
0
votes
0
answers
74
views
How do I remove RangeIndex and dtypes from output display?
OUTPUT DISPLAY- This is the output of my program project and only thing is remaining is that RangeIndex and dtypes values which I can't able to remoe from output display
SOURCE CODE- I are working on ...
0
votes
1
answer
62
views
Custom classifier won't accept data from test_train_split in sklearn
I am attempting to write a custom classifier for use in a sklearn gridsearchCV pipeline.
I've stripped everything back to the bare minimum in the class which currently looks like this:
from sklearn....
0
votes
1
answer
74
views
Using kNN with weighted dataset
I have a dataset df:
category
var 1
...
var 32
weighting
country
1
blue
1.0
54.2
3.0
US
2
pink
0.0
101.0
1.0
other
3
blue
1.0
49.9
3.0
US
4
green
1.0
72.2
9.0
US
I'm using the kNN classifier (on ...
1
vote
1
answer
49
views
Even though I have successfully installed sklearn via Jupyter, I cannot access its classes. What mistake did I make?
Even though I have successfully installed sklearn via Jupyter, I cannot access its classes. What mistake did I make?
!pip install sklearn
import sklean
from sklearn.preprocessing import LabelEncoder
...
0
votes
1
answer
95
views
XGBoost evaluation
I built a model using XGBoost algorithm to predict precipitations. It turns out that the RMSE is equal to 7.6. Does it mean that the model performs poorly? If so, what would be your piece of advice to ...
1
vote
1
answer
84
views
sklearn requirements installation with pip: ensure that binary wheels are used
In the sklearn installation guide for the latest version (1.3.1) it mentions that you can install dependencies with pip, but says
"When using pip, please ensure that binary wheels are used, and ...
0
votes
0
answers
153
views
How to Implement Custom Distance Metrics in Sklearn Nearest Neighbor
I am trying to implement my own distance metrics specifically Jaro distance in Sklearn Nearest Neighbour and I am getting back some errors. I've tried looking up online and didn't manage to find a ...
1
vote
2
answers
84
views
I want to create a linear regression scatterplot for an assignment. State names are giving me an error (?)
I am working on a project tracking poverty across the US between 1995 and 2020.
As I am working on a linear regression scatterplot, I have this:
# Create a regression object.
regression = ...
1
vote
0
answers
240
views
Why is InterClusterDistance from yellowbrick failing with "AttributeError: 'NoneType' object has no attribute '_get_renderer'"
I am trying to initialize a InterClusterDistance visualizer from the yellowbrick library. When I execute the following:
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans
from ...
1
vote
0
answers
195
views
Using faster pandas groupby class on multiple columns
Short version:
I need help applying someone else's groupby class on multiple pandas columns and with more complicated functions.
Long version:
Someone else (Elizabeth Santorella) wrote a python class ...
0
votes
1
answer
145
views
MinMaxScaler doesn't scale small values to 1
I found weird behavior of sklearn.preprocessing.MinMaxScaler and same for sklearn.preprocessing.RobustScaler
When data max value is very small < 10^(-16) transformer doesn't change data max value ...
0
votes
1
answer
2k
views
How to Prepare the Data for a Logistic Regression Using SKLearn
Hello there ) I'm working on an undergraduate data analysis project and would seek guidance in regard to the following case study:
What I'm working with:
I have a data frame consisting of 3'891 ...
0
votes
1
answer
53
views
Application of Min & Max function
My goal is to normalize my data (minimization & Maximization) for machine learning purposes. The issue I am having is presented when you run the code below.
#importation of libraries:
import ...
0
votes
1
answer
34
views
How to count number of occurences of each element using groupby in pandas column [duplicate]
Following is the df
from io import StringIO
import pandas as pd
df = pd.read_csv(StringIO("""
Group Date Rank
A 01-01-2023 1
A 01-02-2023 2
A 01-03-2023 3
A 01-04-2023 2
A 01-05-2023 1
...
-1
votes
2
answers
136
views
Sklearn Random Forest: determine the name of features ascertained by parameter grid for model fit and prediction
New to ML here and trying my hands on fitting a model using Random Forest. Here is my simplified code:
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.15, ...
0
votes
1
answer
501
views
I got the following error: 'DataFrame' object has no attribute 'year'
picture of csv file containing raw data I am trying to plot a scatter graph using an online csv file i downloaded in inorder to get the linear regression.
%matplotlib inline plt.scatter(df.year, df....
0
votes
1
answer
118
views
How to reverse the encoding of sklearn LabelEncoder() after training the model?
So I am currently creating a machine learning model in Python which predicts the outcome of a football match. Below is the code from the training of the model:
features = ['Home Team',..., '...
0
votes
1
answer
368
views
Sklearn Pipelines - Feature Engineering
I wrote a simple generic XGBoost classifier code that runs with a pipeline. This is the code (with simple config example):
import optuna
import pickle
import pandas as pd
from xgboost import ...
0
votes
1
answer
93
views
Why are the kmeans centroids far from the data? Python
I'm making a kmeans model with the data from Twitter, but when I apply the polarity and subjectivity analysis on the scatterplot, the centroids (red x) appear far from the data:
from sklearn....
-1
votes
1
answer
241
views
Error when trying to fit a dataset. (python)
I am trying to fit a sklearn linear regression model with many points from a pandas dataframe. this is the program:
features =["floors", "waterfront", "lat", "...
-1
votes
1
answer
927
views
ValueError: Input contains NaN, infinity or a value too large for dtype('float64') when using randomizedSearch
I am trying to use RandomizedSearchCV from sklearn on an MLPRegressor model, and I have scaled the data using standardScaler. The code for the model is presented below. When I try to run the code I ...
0
votes
0
answers
83
views
ValueError: setting an array element with a sequence. In decisionTreeClassifier fit
I have the following code, I'm just trying to teach myself how to use a machine learning model.
import ast
import csv
import pandas as pd
import numpy as np
from sklearn.tree import ...
0
votes
1
answer
242
views
How to find RMSE without test value in python [closed]
First, I am wondering if is there a way to find the RMSE value with the y-test value(I can do it if I have a y-test value). For instance, we have train data and test data. But in the test data, we don'...
-1
votes
1
answer
362
views
Do we need to exclude OneHotEncoded columns while standardizing or normalizing using MinMaxScaler() or StandardScaler()?
This is the final cleaned DataFrame (df2) before Standardizing
my code:
scaler=StandardScaler()
df2[list(df2.columns)]=scaler.fit_transform(df2[list(df2.columns)])
df2
This returns a DataFrame after ...
0
votes
1
answer
337
views
How to get predict from string data in sklearn
When I convert data from a pandas dataframe to sklearn so I can make predictions. String data becomes problematic. So I used labelencoder but it seems to limit me to using the encoded data instead of ...
2
votes
1
answer
98
views
Complicated double sum using groupby in Pandas dataframe
I have a dataframe that looks like
Race_ID Date Student_ID a b
1 1/1/2023 1 3 1
1 1/1/2023 2 2 2
1 1/1/...
0
votes
1
answer
152
views
Count fruits on tree using ML sklearn
This my python code where I am try to predict the fruit count on tree using sklearn
but ran into issue code is given below:
import cv2
from sklearn.ensemble import RandomForestClassifier
def ...
2
votes
1
answer
98
views
Train and test data setup for sklearn
I'm creating a classification model to predict the outcome of sports event(win/loss) and am running into a data setup conundrum.
Currently the data is setup as follows:
example_data = [team_a_feat_1, ...
0
votes
0
answers
87
views
How to fit a 2D Feature in MLP Sklearn
I have this pandas dataframe ModelCFL = pd.DataFrame(columns=['data_case', 'length'])
I fill this Dataframe with my csv files:
cases = ['0', '250']
for i in range(0, len(cases)):
cases_data = ...
-1
votes
1
answer
177
views
How to implement regularization
My task was to implement model parameter tuning using stochastic gradient descent. Below is my function implementation code. However, I would like to add any regularization.
def gradient(X, y, w, ...
0
votes
1
answer
75
views
Python sklearn confusion matrix
I am trying to create a confusion matrix with probabilities.
y_pred_train = logistic.predict_proba(X_train)
confusion_matrix(y_train, y_pred_train)
ValueError: Classification metrics can't handle a ...
0
votes
0
answers
360
views
pyInstaller execution failure due to missing Module
I'm not quite sure why I am finding this issue. I have installed sklearn in my environment and imported it within the code. Specifically, I am using sklearn.metrics.r2_score and I actually don't call ...
0
votes
0
answers
223
views
why am I getting 0 in TN and FN in confusion matrix using keras model?
I try to do NN text classification using keras. But when I do confusion matrix it get 0 in TN and FN. It strange cause when I make model using sklearn MLPClassifier with same data and hyperparameter I ...
0
votes
1
answer
350
views
How to use FunctionTransformer with a custom function?
I want to use FunctionTransformer to perform calculations between columns. For instance, I want to substract two columns and add the the new column to the dataset. So I have the function:
def diff(x, ...