Newest 'python' Questions

0 votes

0 answers

12 views

Unexpected Feature Importance Pattern in Random Forest Classification of MNIST Digits 0 and 1

I performed Random Forest–based feature importance analysis on the MNIST dataset, focusing only on digits 0 and 1. When I visualize the importance map (see image below), it doesn’t resemble the ...

Manish Yadav

1

asked Nov 12 at 15:17

0 votes

0 answers

10 views

How can I group transcribed phrases into meaningful chunks without using complex models?

I have a large set of phrases obtained via Azure Fast Transcription, and I need to group them into coherent semantic chunks (to use later in a RAG pipeline). Initially, I tried grouping phrases based ...

Daniel

1

asked Nov 7 at 8:53

0 votes

0 answers

17 views

How to extract my fingerprint from my laptop's finger sensor

So like I have a bunch of fingerprint as a data set (my college gave me). Now I want to use these fingerprint as datasets and train a model to understand the different things. That is beside the point....

Sayan

1

asked Nov 6 at 17:23

1 vote

0 answers

34 views

How to identify and quantify main tendencies across participants from cluster membership heatmaps?

I'd appreciate your thoughts on the following problem. I've created a heatmap plot (attached) showing the cluster membership ratio for each participant (in separate subplots) and condition (η). Now, I'...

maria mystakidou

11

asked Oct 23 at 9:21

1 vote

0 answers

11 views

How to interpret an unstable learning curve on a model tuned with Hyberband Tuning?

I have used Hyperband automatic tuning for an ANN model to predict price. After running the model with the automatic tuning, I am obtaining an R2 score of 1.00 that suggests overfitting, however, I am ...

leakie

11

asked Oct 18 at 15:16

4 votes

0 answers

29 views

Time-efficient parallelization of masks for pre-processing a dataset

I have a large dataset (~10M points) in python and I want to filter it using a large number of different custom masks, as part of calculations to create a new but related dataset. Because the dataset ...

quail

41

asked Oct 6 at 18:36

5 votes

1 answer

59 views

Jupyter notebooks compiled from different building blocks

I use Jupyter notebooks to teach programming, using markdown in text cells, and I want to separate the concepts by level-1 headings (starting with # Heading), for ...

ginjaemocoes

153

asked Oct 2 at 2:10

4 votes

1 answer

76 views

RAG Chatbot does not keep track of chat session history

I built a RAG chatbot in python,langchain, OpenAI LLM, and FAISS for the vectorstore. And the data is stored as JSON. The chatbot does not always keep track of the inputs and outputs. Here is an ...

SoftwareEngineer

61

asked Sep 27 at 7:22

2 votes

1 answer

40 views

Is it possible to make the python widget in Orange to give output and receive input (both in the same widget)

I'm working on a project which works on loop control, when I try to implement that in the orange platform, I'm unable to connect one widget (python script) to another in loop, as the connection is ...

Anto Delin Xavier

21

asked Sep 26 at 18:46

0 votes

1 answer

58 views

NLP : How to clean the data of a conversation correctly?

Say we have the data as follows Input ...

Punreach Rany

101

asked Sep 21 at 23:37

2 votes

0 answers

65 views

RAG Chatbot does not answer paraphrased questions

I built a RAG chatbot in python,langchain, and FAISS for the vectorstore. And the data is stored as JSON. The chatbot sometimes refuses to answer when a question is rephrased. Here are two ...

SoftwareEngineer

61

asked Sep 20 at 16:00

0 votes

0 answers

45 views

Qiskit Problem: this solution is a bit slow, is there a way to make it faster and increase the accuracy a little bit?

I'm currently making a small binary classification program using Quantum Machine Learning (EstimatorQNN to be more specific). My program classifies data inside the Wisconsin Breast Cancer database and ...

Andrea

1

asked Sep 18 at 11:48

5 votes

1 answer

119 views

How to get MLFlow built container to listen on 0.0.0.0?

I'm following this tutorial and am stuck on step 8: https://mlflow.org/docs/latest/ml/getting-started/hyperparameter-tuning/#test-your-container The inference server is listening on ...

Zhao Li

153

asked Sep 16 at 17:29

7 votes

1 answer

76 views

Time series imputation using transformers and LLMs

So I was working on a multivariate time-series data, is it possible that I can impute or interpolate the missing data using transformer or pre-trained, fine-tuned LLMs? Some insights about it please. ...

Am_Bn

71

asked Sep 11 at 6:30

8 votes

1 answer

150 views

How to correctly implement the loss function for my distillation of Mask2Former?

I have a Mask2Former model fine-tuned on my own custom dataset and it is working nicely. I want to play around with knowledge distillation and use my pretrained ...

Andrei

83

asked Sep 6 at 11:46

4 votes

2 answers

441 views

NLP of noisy unpredictable text to extract dates--just regex?

Question: Are there better approaches than regex for extracting event dates (including relative) from noisy text? Are there NLP tools that can help disambiguate multiple date mentions in various ...

ja_him

143

asked Aug 19 at 8:01

9 votes

2 answers

215 views

Is it best practice to remove outliers from transaction data used for training?

I am building a random forest regression model. The goal is to predict the maximum each customer will spend in a single transaction during the next 90 days. I have transaction data for 7m customers, ...

SRJCoding

191

asked Aug 12 at 9:00

1 vote

0 answers

29 views

Issue with running training on multigpu using DDP

I am training a classifier model but since it is taking far too long I want to use multigpu for the training. The current code is ...

Shlok Sharma

111

asked Aug 8 at 2:47

10 votes

1 answer

5k views

Is CUDA 13 a thing (or am I misinterpreting something)?

A few days ago I installed my new NVIDIA GeForce RTX 5090 and I can't get pytorch to work on my Win11 Desktop (just background info, the question is not directly ...

wischi

203

asked Aug 3 at 11:57

4 votes

1 answer

49 views

Is there a way to programatically schedule jobs on Airflow or Cron Daemon?

The question is more data engineering related than data science, but since there is no data engineering stack exchange, thought I will shoot it here. Basically, as the title says. So, as part of a ...

Della

465

asked Jul 31 at 12:05

9 votes

1 answer

1k views

How should a typical reward curve look like while training a RL model

I have set up a DQN with TorchRL to solve a problem where the agent can move in a square grid and pick some rewards scattered randomly on it. Right now, I am using a 5x5 grid and have 3 rewards on it. ...

Ícaro Lorran

321

asked Jul 21 at 18:20

7 votes

1 answer

296 views

SciPy's dendrogram method depicts two cluster merges as one

I am following the example code in the linkage documentation: ...

user2153235

605

asked Jul 15 at 19:33

6 votes

1 answer

87 views

SciPy's linkage method should take 1D condensed distance matrix of length n choose 2

I am educating myself on hierarchical clustering and the relevant SciPy methods. The 1st argument of the linkage method is a 1D condensed distance matrix $X$ of ...

user2153235

605

asked Jul 15 at 17:59

1 vote

1 answer

70 views

Improving a GenAI Tool to Explain XGBoost Model Outputs for Individual Predictions

I have developed an XGBoost model to predict a target variable based on a set of input indicators. I'm now building a Generative AI-based tool that can take an individual's data—i.e., the values of ...

Alberto De Benedittis

55

asked Jul 14 at 15:42

0 votes

0 answers

21 views

Runtime complexity of scikit-learn’s One-vs-Rest LogisticRegression (LBFGS) vs. RidgeClassifier

I’m working through the runtime analysis of scikit-learn’s OneVsRestClassifier for two cases: LogisticRegression (solver=lbfgs, ...

user184658

1

asked Jul 10 at 10:09

1 vote

0 answers

50 views

How to improve classification model (item will sell that day or not) for dataset with multiple sparce timeseries?

I am trying to create one big model(lightGB) that forecasts sales for each product for cosmetic chain store. Dataset I am working with is last 5 years data and has these columns: ...

13aba

11

asked Jul 2 at 5:42

4 votes

1 answer

214 views

Why do I need to call np.transpose() on this?

I have the following python script: ...

JouJour

101

asked Jun 27 at 13:01

0 votes

0 answers

30 views

expected the model to forecast resolution time more accurately based on past ticket patterns. I was also hoping to unde

day Modified today Viewed 25 times 0 I want to build a model that forecasts ticket resolution time for a data science software support tickets . I’ve calculated queuing time and resolution time from ...

Rebel Royals

11

asked Jun 26 at 10:03

2 votes

1 answer

90 views

How to visualize the images?

Suppose we have 24 images per day, one per hour. And every image is 24×24 CSV file. I do the following transformation for every day: The first image is unchanged. For the second image, move column i ...

S. M.

95

asked Jun 20 at 0:07

2 votes

1 answer

41 views

N-Beats, Pytorch forecasting: predicitons are slightly shifted

I am applying the N Beats Model of the pytorch-forecasting package on a traffic dataset. I am doing single step prediction with a context length of 5. Now the prediction is unfortunately slightly ...

PhilkoGIT

21

asked Jun 18 at 20:29

1 vote

1 answer

137 views

How to preprocess code samples for a neural network to detect AI-generated code?

I’m building a plagiarism detector to identify AI-generated code on platforms like Codeforces. I’ve scraped 1,193 human and AI-generated code samples (Python, C++, Java) for the same problems. My goal ...

vinod pandey

11

asked Jun 7 at 11:35

1 vote

0 answers

127 views

ML models that train on graphs but infer without any edges (edge prediction task)

I'm exploring a machine learning research direction and I'm looking for ideas or pointers to existing models/projects that fit the following setup: The model is trained on graphs with edge information ...

lili

361

asked Jun 3 at 14:07

1 vote

0 answers

29 views

Time series OLS: Stationarity transformation

I am building a time-series forecasting model using OLS. For preparation, I am making all series stationary (for now). What I don't understand: Series should be stationary Achieving stationarity can ...

aze45sq6d

145

asked Jun 3 at 8:43

0 votes

0 answers

31 views

Predicting dependency links between industrial tasks using a transformer (CamemBERT) — poor results

I'm working on a machine learning project aimed at automatically predicting dependency links between tasks in industrial maintenance procedures in a group of tasks called gamme. Each gamme consists of ...

lili

361

asked Jun 2 at 13:58

4 votes

2 answers

144 views

Quants : Beta calculation using pandas

Editing to add one key information ( df and dailyRet ), which I noticed how imp it is... after solving this issue. ...

Vineet Tripathi

61

asked May 30 at 18:34

1 vote

0 answers

44 views

VARMA runtime issues: fixed window rolling forecasting

I'm currently exploring a couple of statistical forecasting methods. My problem rose when using VARMA(2,1) fixed window rolling forecast. The example code that I'm using is the following: Here I only ...

Silvio Klenk

11

asked May 27 at 8:14

3 votes

1 answer

112 views

Which model is the best suitable for generating edges?

I'm trying to develop a model who'd be able to generate dependencies between industrial tasks. In order to do that, i went for the GNN solution : i have nodes = tasks, dependencies = edges, and have ...

lili

361

asked May 22 at 9:25

0 votes

1 answer

37 views

Why is my upscaling gan not working?

I have been trying to code an upscaling gan but while the code run, I pretty much always end up with terrible result when the gan doesn't collapse, collapse which happen often. I previously tried to ...

Freeziey

1

asked May 14 at 10:49

2 votes

0 answers

48 views

Need help with model architecture and sampling negative edges

I am currently training a graph transformer model in order to develop an AI who'd be able to generate edges on a unseen graph (link dependencies between text with historical data). I divided my ...

lili

361

asked May 12 at 8:49

1 vote

0 answers

40 views

GNN Loss NaN after first training example?

I am trying to train a GNN but am getting a NaN loss function immediately after the first training example. Below I have included all of the pertinent code. My input is 385 points in 3D space confined ...

Will Borrelli

11

asked May 6 at 15:01

1 vote

1 answer

56 views

Need support to straighten,crop image properly for requirement in computer vision

My requirement: Need to extract license plates without duplicates and store images in a folder,then apply ocr to extract text from images. What i have achieved: Iam able to detect license plates ...

Raj

11

asked May 6 at 12:45

1 vote

0 answers

38 views

How to correctly use a transformer model for a generating dependencies project

I'm currectly trying to train a model in order to predict dependencies between text, here it's industrials tasks, based on historical data. The goal is to learn that "Task A precedes Task B for ...

lili

361

asked May 6 at 6:53

2 votes

1 answer

129 views

XGBoost GPU version not outperforming CPU on small dataset despite parameter tuning – suggestions needed

I'm currently working on a Parallel and Distributed Computing project where I'm comparing the performance of both XGBoost and CatBoost when trained on CPU vs GPU. The goal is to demonstrate how GPU ...

Mxneeb

21

asked May 3 at 21:32

3 votes

0 answers

46 views

suppose 1 category in a variable create data leakage, can we use other categories in the same variable as dummy to predict?

We are predicting conversion. Conversion means customer converted from paying one-off to paying regular (subscribe) If one feature is categorical feature "Activity" , consisting 15+ ...

user30388975

31

asked Apr 28 at 5:31

3 votes

0 answers

43 views

How can i plot large .nc files with xarray and matplotlib?

I have a 11GB .nc file with lon/lat positions, and particle trajectories on the ocean surface for a timespan of 40 days. For small files (Approx 140MB) i use xarray, netCDF4, matplotlib and cartopy to ...

otk

141

asked Apr 24 at 11:59

8 votes

1 answer

179 views

How to correctly perform link prediction inference on a new, unseen graph?"

I'm working on an industrial AI use case where I train a Graph Neural Network (GCN) for link prediction — specifically, to predict successor tasks in project planning graphs (e.g., for construction or ...

lili

361

asked Apr 24 at 6:10

2 votes

0 answers

36 views

Anomaly detection time in time-series for drops

I am looking into different statistical methods for determining a decrease in a numeric "count" feature across a time-series dataset. The dataset is relatively small (about 50 records), and ...

Mar

165

asked Apr 22 at 13:46

4 votes

0 answers

27 views

Low Accuracy from Geospatial Random forest ML modeling problem - Training Exported from qGIS, SCP

I am doing a geospatial assessment integrated with ML modeling. The problem is the very low accuracy percentage, as more training features increases, it gets lower. What could be the solution to such ...

Reem

41

asked Apr 21 at 18:45

1 vote

0 answers

37 views

Isolation Forest sample size

I am using sklearn's Isolation Forest as a model to detect anomalies. My dataset is relatively small, 50 records with only 2-3 features. To prevent any overfitting, what would you recommend to tune ...

Mar

165

asked Apr 21 at 18:28

2 votes

0 answers

41 views

What's wrong with my ML implementation? (from a technical report)

I came across a (short and curt) technical report that claims to be SOTA on keyword spotting, but it didn't share its code and had a very short explanation of its network. I implemented the model, but ...

FloopyBeep

21

asked Apr 21 at 4:27

Questions tagged [python]