121 questions
5
votes
7
answers
169
views
Label Encoding on multiple columns in R
I have a dataframe that contains columns that have categorical responses. I'd like to perform label enconding of the observations on all the columns at a go
Gender <- c("Male", "...
0
votes
0
answers
20
views
LabelEncoder transform OneHotencoder
I am working on Python's jupiter notebook platform.I have a dataset whose output consists of 4 different labels and whose output data is a categorical variable. I coded this output data with Label ...
0
votes
3
answers
74
views
Extend LabelEncoder classes
I have a LabelEncoder with 500 classes.
To store and load it, I used pickle:
with open('../data/label_encoder_v500.pkl', 'rb') as file:
label_encoder = pickle.load(file)
I want to add 24 new ...
3
votes
3
answers
1k
views
How to Apply LabelEncoder to a Polars DataFrame Column?
I'm trying to use scikit-learn's LabelEncoder with a Polars DataFrame to encode a categorical column. I am using the following code.
import polars as pl
from sklearn.preprocessing import LabelEncoder
...
0
votes
1
answer
281
views
How do I convert string data to numerical data using Label Encoder?
I was trying to convert string data into numerical data in a CSV excel sheet. It kept giving me an error about previously unseen labels, so I searched it up and found that we can use Label Encoder to ...
0
votes
0
answers
190
views
How to Encode Non-Ordinal Categorical Variables for RandomForest without Using Label Encoding?
I need to predict different types of exploitation using a RandomForestClassifier. My dataset contains several categorical variables such as gender, citizenship, and CountryOfExploitation. These ...
1
vote
0
answers
66
views
Pipeline for ML model using LabelEncoding in a Transformer [duplicate]
I'm attempting to incorporate various transformations into a scikit-learn pipeline along with a LightGBM model. This model aims to predict the prices of second-hand vehicles. Once trained, I plan to ...
1
vote
0
answers
69
views
Label Encoding for Categorical Features: Preserving Label Consistency Across Runs
Problem Description:
Label Encoding Issue: Upon rerunning the label encoding code, the labels change, causing inconsistency.
Dynamic Data from a Server: Incoming data might introduce new values, ...
0
votes
2
answers
95
views
Label Encoder can't 'see' previously 'seen' labels
I need to encode a column having 4 classes i.e., Education with classes Bachelor's, Master's, PhD, and High School. When I fit the label encoder to the training set (tr, here) and transform the test ...
1
vote
1
answer
165
views
Testing model gives error: "y contains previously unseen labels"
# Import necessary libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from xgboost import XGBRegressor
from sklearn.metrics import mean_squared_error,...
0
votes
0
answers
18
views
Handle New Categories for both training and test set
I worked on a classification problem for the Kaggle Competition, and I found there are 3 categorical columns, but I noticed that the service column in the train data has 66 categories, and in the test ...
0
votes
1
answer
41
views
Label Encoding of Categorical values for Future df
I am building a model where LabelEncoding of 2 categorical columns is a better approach. So I had implemented the same on the train_df and finalized the model.
And for predicting the test_df, I used ...
0
votes
0
answers
69
views
Label Encouding: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
I'm trying to run a neural network. And I have some questions related to this error, beacuase I'm trying to convert categorical inputs into numerical ordinary input.
So the data equals:
data base
so I ...
4
votes
0
answers
15k
views
# If we have a listlike key, _check_indexing_error will raise when selecting columns
I am trying to build a KNearest Neighbor system that will help me classify distances.
The columns from the original dataframe have columns totalDistance and Label.
To use KNN I have to encode the ...
-1
votes
1
answer
68
views
Error on sklearn --- ValueError: not enough values to unpack (expected 3, got 1)
I am working on an image segmentation problem but I faced this issue. I am trying to create encode labels with multi diminutional array but it need to be flatten, encode and reshape
-------------------...
-1
votes
1
answer
78
views
scikit-learn labelencoder unseen values
uv = np.unique(X[:, 2])
uv2 = np.unique(X_test[:, 2])
print(uv)
#['Female' 'Male']
print(uv2)
#['Female' 'Male']
# Encoding categorical columns in the train dataset
from sklearn.preprocessing ...
0
votes
1
answer
395
views
How to dump Label Encoder values for multiple columns in a dataframe
As you can see, I have a preprocessing function here and doing some converting operations. I have some categorical variables and I defined them as categorical_cols, and using LabelEncoder for them. My ...
0
votes
2
answers
77
views
Feature Engineering in Python
How do I know when to apply LabelEncoder() or OneHotEncoder()?
I have used LabelEncoder to encode categorical variable for RandomForestRegressor model and it gives a extremely high mean squared error. ...
0
votes
1
answer
903
views
How to I use the existing ML model to predict for new data value?
I have built a machine learning model using 34 features. Now I want to check how well the model predicts the new data value. However, initially there were 26 features but one-hot and label encoding ...
0
votes
1
answer
87
views
how to inverse label encoding in python
im working on a data set and i do label encoding for cat features and i tried to do the inverse now but an error appears like this
----> 1 original_labels = labelencoder.inverse_transform(df['model'...
0
votes
2
answers
49
views
Laeble encoding pandas dataframe, same label for same value
Here is a snippet of my df:
0 1 2 3 4 5 ... 11 12 13 14 15 16
0 BSO PRV BSI TUR WSP ACP ... HLR HEX HEX None None None
1 BSO PRV BSI ...
0
votes
1
answer
337
views
How to get predict from string data in sklearn
When I convert data from a pandas dataframe to sklearn so I can make predictions. String data becomes problematic. So I used labelencoder but it seems to limit me to using the encoded data instead of ...
0
votes
1
answer
73
views
Rank does not go in order if the value does not change
I have a dataframe:
data = [['p1', 't1'], ['p4', 't2'], ['p2', 't1'],['p4', 't3'],
['p4', 't3'], ['p3', 't1'],]
sdf = spark.createDataFrame(data, schema = ['id', 'text'])
sdf.show()
+---+--...
0
votes
1
answer
42
views
Reordering categorical variables using a specified ordering?
I have a X_train dataframe. One of the columns locale has the unique values: ['Regional', 'Local', 'National'].
I am trying to make this column into an Ordered Categorical variable, with the correct ...
1
vote
1
answer
142
views
i can't apply labelencoder to array of bool
I am on a machine learning project. I did import all libraries. I took one column of data(this column is array of bool) and i want to apply it labelencoder.
Here is my whole code.
data = pd.read_csv('...
1
vote
2
answers
66
views
how do i filter columns with data_type= object
encoder=LabelEncoder()
categorical_features=df.columns.tolist()
for col in categorical_features:
df[col]=encoder.fit_transform(df[col])
df.head(20)
**i want categorical_features to take columns ...
1
vote
1
answer
46
views
Why the index of Label Encoding is not seriated?
This is my label value:
df['Label'].value_counts()
------------------------------------
Benign 4401366
DDoS attacks-LOIC-HTTP 576191
FTP-BruteForce 193360
SSH-...
0
votes
1
answer
482
views
How to order values when label-encoding?
I want to label-encode a column called article_id which has unique identifiers for an article.
Integer values kind of implicitly have an order to them, because 3 > 2 > 1.
I wonder what is the ...
0
votes
2
answers
479
views
Getting a ValueError after running the LabelEncoder command
I'm working on a ML webapp and am training data from a CSV file. When converting the data array to float the ValueError appears
CODE
X[:, 0] = le_country.transform(X[:,0]) X[:, 1] = le_education....
2
votes
2
answers
275
views
Label encode subgroups after groupby
I want to label encode subgroups in a pandas dataframe. Something like this:
| Category | | Name |
| ---------- | | --------- |
| FRUITS | | Apple |
| FRUITS | | Orange |
| ...
0
votes
1
answer
1k
views
How to override fit() and predict() in a Keras model
I've created a subclass of the keras.models.Sequential class, so that to override the fit() and predict() functions.
My goal is to 'hide' the a sklearn LabelEncoder. This way I can directly call fit() ...
0
votes
1
answer
604
views
Label Encoder to Categories
I have created a ML model with Random forest it has 6000+ data with 27 features out of which about 22 were categorical data i have used label encoder on it.Now when i have to predict the result is ...
0
votes
1
answer
638
views
How to do Label encoding in Azure ML studio?
I have a total of around 80 columns out of which some 20 columns are categorical which needs to be label encoded. I checked the solution provided here and the solution stated to work with Feature ...
0
votes
1
answer
160
views
label encoder unable to convert a range of categorical columns into numerical columns
I have a 50 columns, categorical dataset. Among them only 5 columns are numerical. I would like to apply label encoder to make the categorical columns to numerical columns. Categorical columns are ...
2
votes
0
answers
141
views
Reversing Sci-Kit LabelEncoder, but have a 2D array dataset
I'm trying to create an automated data pre-processing library and I want to transform the string data into numerical so it can be ran through ML algorithms. But I can't seem to reverse it back to its ...
0
votes
1
answer
247
views
label encoding in dask_cudf dataframe
I am trying to use dask_cudf to preprocess a very large dataset (150,000,000+ records) for multi-class xgboost training and am having trouble encoding the class column (dtype is string). I tried using ...
0
votes
1
answer
2k
views
ValueError: y contains previously unseen labels: 'some label'
Whenever i am trying to execute the following code it is showing ValueError: y contains previously unseen labels: 'some_label'
X_test['Gender'] = le.transform(X_test['Gender'])
X_test['Age'] = le....
0
votes
2
answers
175
views
How to apply label encoding uniformly in all columns?
I have a dataset of which I have attached an image.
The set of unique values in Origin and Dest are same. Upon doing label encoding of those columns, I thought that value ATL will get same encoding in ...
0
votes
0
answers
529
views
How can i map predicted values (after using RandomForestClassifier) back to their original values in Python?
For context, I am taking Ad listing data for Machines and using it to predict the type of Machine.
I have used the RandomForestClassifier for class prediction. In the model I have used LabelEncoder to ...
0
votes
1
answer
603
views
sklearn LabelEncoder to combine multiple values into a single label
I am looking to run classification on a column that has few possible values, but i want to consolidate them into fewer labels.
for example, a job may have multiple end states: success, fail, error, ...
0
votes
1
answer
444
views
LabelEncoding large amounts of categorical data
I have a dataset with 39 categorical and 27 numerical features. I am trying to encode the categorical data and need to be able to inverse transform and call transform for each column again. Is there a ...
2
votes
1
answer
1k
views
LabelEncoding in Pandas on a column with list of strings across rows
I would like to LabelEncode a column in pandas where each row contains a list of strings. Since a similar string/text carries a same meaning across rows, encoding should respect that, and ideally ...
0
votes
1
answer
78
views
A function for onehotencoding and labelencoding in a dataframe
I keep getting AttributeError: 'DataFrame' object has no attribute 'column' when I run the function on a column in a dataframe
def reform (column, dataframe):
if dataframe.column.nunique() > 2 ...
0
votes
0
answers
39
views
scikit learn label encoding prints as row instead of column
I am trying to do label encoding using sci kit learn's built in function but why does my result print as row instead of an additional column?
from sklearn.preprocessing import LabelEncoder
# creating ...
0
votes
1
answer
2k
views
What is the the good way to proceed with LabelEncoder with sklearn to get back the coulples?
I have a dataframe with categorical value like city name for instance.
For ML algo., I need then encode the data into numerical value.
I do it like this:
df[cat_columns] = df[cat_columns].apply(...
1
vote
1
answer
1k
views
Sklearn Label Encoder - Not getting desired output based on prediction and inverse transform
I'm new to the Python ML using scikit. I was working on a solution to create a model with three columns Pets, Owner and location.
import pandas
import joblib
from sklearn.tree import ...
1
vote
1
answer
745
views
Alternatives of LabelEncoder() for target variable while implementing in a pipeline
I am developing a classification base model. I have used the concept of ColumnTransformer and Pipeline for feature engineering and selection, model selection, and for everything. I wanted to encode my ...
0
votes
3
answers
2k
views
How to get true labels from LabelEncoder
I have the below code snippet:
df = pd.read_csv("data.csv")
X = df.drop(['label'], axis=1)
Y= df['label']
le = LabelEncoder()
Y = le.fit_transform(Y)
mapping = dict(zip(le.classes_, range(...
1
vote
1
answer
510
views
Label encoding by value counts
I try to do label encoding for my cities. However, I want it to label according to which city is more than others. Let's say;
Oslo has 500 rows
Berlin has 400 rows
Napoli has 300 rows in the dataset
...
0
votes
1
answer
935
views
Self Define in LabelEncoder
Trying to encode data in a csv file. TA in class recommend LabelEncoder in sklearn. There's one column names education_level. And I need to encode it in "High, Medium, Low" order. But the ...