Skip to main content

Questions tagged [python]

Use for data science questions related to the programming language Python. Not intended for general coding questions (which should be asked on Stack Overflow).

Filter by
Sorted by
Tagged with
0 votes
0 answers
12 views

I performed Random Forest–based feature importance analysis on the MNIST dataset, focusing only on digits 0 and 1. When I visualize the importance map (see image below), it doesn’t resemble the ...
Manish Yadav's user avatar
0 votes
0 answers
10 views

I have a large set of phrases obtained via Azure Fast Transcription, and I need to group them into coherent semantic chunks (to use later in a RAG pipeline). Initially, I tried grouping phrases based ...
Daniel's user avatar
  • 1
0 votes
0 answers
17 views

So like I have a bunch of fingerprint as a data set (my college gave me). Now I want to use these fingerprint as datasets and train a model to understand the different things. That is beside the point....
Sayan's user avatar
  • 1
1 vote
0 answers
34 views

I'd appreciate your thoughts on the following problem. I've created a heatmap plot (attached) showing the cluster membership ratio for each participant (in separate subplots) and condition (η). Now, I'...
maria mystakidou's user avatar
1 vote
0 answers
11 views

I have used Hyperband automatic tuning for an ANN model to predict price. After running the model with the automatic tuning, I am obtaining an R2 score of 1.00 that suggests overfitting, however, I am ...
leakie's user avatar
  • 11
4 votes
0 answers
29 views

I have a large dataset (~10M points) in python and I want to filter it using a large number of different custom masks, as part of calculations to create a new but related dataset. Because the dataset ...
quail's user avatar
  • 41
5 votes
1 answer
59 views

I use Jupyter notebooks to teach programming, using markdown in text cells, and I want to separate the concepts by level-1 headings (starting with # Heading), for ...
ginjaemocoes's user avatar
4 votes
1 answer
76 views

I built a RAG chatbot in python,langchain, OpenAI LLM, and FAISS for the vectorstore. And the data is stored as JSON. The chatbot does not always keep track of the inputs and outputs. Here is an ...
SoftwareEngineer's user avatar
2 votes
1 answer
40 views

I'm working on a project which works on loop control, when I try to implement that in the orange platform, I'm unable to connect one widget (python script) to another in loop, as the connection is ...
Anto Delin Xavier's user avatar
0 votes
1 answer
58 views

Say we have the data as follows Input ...
Punreach Rany's user avatar
2 votes
0 answers
65 views

I built a RAG chatbot in python,langchain, and FAISS for the vectorstore. And the data is stored as JSON. The chatbot sometimes refuses to answer when a question is rephrased. Here are two ...
SoftwareEngineer's user avatar
0 votes
0 answers
45 views

I'm currently making a small binary classification program using Quantum Machine Learning (EstimatorQNN to be more specific). My program classifies data inside the Wisconsin Breast Cancer database and ...
Andrea's user avatar
  • 1
5 votes
1 answer
119 views

I'm following this tutorial and am stuck on step 8: https://mlflow.org/docs/latest/ml/getting-started/hyperparameter-tuning/#test-your-container The inference server is listening on ...
Zhao Li's user avatar
  • 153
7 votes
1 answer
76 views

So I was working on a multivariate time-series data, is it possible that I can impute or interpolate the missing data using transformer or pre-trained, fine-tuned LLMs? Some insights about it please. ...
Am_Bn's user avatar
  • 71
8 votes
1 answer
150 views

I have a Mask2Former model fine-tuned on my own custom dataset and it is working nicely. I want to play around with knowledge distillation and use my pretrained ...
Andrei's user avatar
  • 83
4 votes
2 answers
441 views

Question: Are there better approaches than regex for extracting event dates (including relative) from noisy text? Are there NLP tools that can help disambiguate multiple date mentions in various ...
ja_him's user avatar
  • 143
9 votes
2 answers
215 views

I am building a random forest regression model. The goal is to predict the maximum each customer will spend in a single transaction during the next 90 days. I have transaction data for 7m customers, ...
SRJCoding's user avatar
  • 191
1 vote
0 answers
29 views

I am training a classifier model but since it is taking far too long I want to use multigpu for the training. The current code is ...
Shlok Sharma's user avatar
10 votes
1 answer
5k views

A few days ago I installed my new NVIDIA GeForce RTX 5090 and I can't get pytorch to work on my Win11 Desktop (just background info, the question is not directly ...
wischi's user avatar
  • 203
4 votes
1 answer
49 views

The question is more data engineering related than data science, but since there is no data engineering stack exchange, thought I will shoot it here. Basically, as the title says. So, as part of a ...
Della's user avatar
  • 465
9 votes
1 answer
1k views

I have set up a DQN with TorchRL to solve a problem where the agent can move in a square grid and pick some rewards scattered randomly on it. Right now, I am using a 5x5 grid and have 3 rewards on it. ...
Ícaro Lorran's user avatar
7 votes
1 answer
296 views

I am following the example code in the linkage documentation: ...
user2153235's user avatar
6 votes
1 answer
87 views

I am educating myself on hierarchical clustering and the relevant SciPy methods. The 1st argument of the linkage method is a 1D condensed distance matrix $X$ of ...
user2153235's user avatar
1 vote
1 answer
70 views

I have developed an XGBoost model to predict a target variable based on a set of input indicators. I'm now building a Generative AI-based tool that can take an individual's data—i.e., the values of ...
Alberto De Benedittis's user avatar
0 votes
0 answers
21 views

I’m working through the runtime analysis of scikit-learn’s OneVsRestClassifier for two cases: LogisticRegression (solver=lbfgs, ...
user184658's user avatar
1 vote
0 answers
50 views

I am trying to create one big model(lightGB) that forecasts sales for each product for cosmetic chain store. Dataset I am working with is last 5 years data and has these columns: ...
13aba's user avatar
  • 11
4 votes
1 answer
214 views

I have the following python script: ...
JouJour's user avatar
  • 101
0 votes
0 answers
30 views

day Modified today Viewed 25 times 0 I want to build a model that forecasts ticket resolution time for a data science software support tickets . I’ve calculated queuing time and resolution time from ...
Rebel Royals's user avatar
2 votes
1 answer
90 views

Suppose we have 24 images per day, one per hour. And every image is 24×24 CSV file. I do the following transformation for every day: The first image is unchanged. For the second image, move column i ...
S. M.'s user avatar
  • 95
2 votes
1 answer
41 views

I am applying the N Beats Model of the pytorch-forecasting package on a traffic dataset. I am doing single step prediction with a context length of 5. Now the prediction is unfortunately slightly ...
PhilkoGIT's user avatar
1 vote
1 answer
137 views

I’m building a plagiarism detector to identify AI-generated code on platforms like Codeforces. I’ve scraped 1,193 human and AI-generated code samples (Python, C++, Java) for the same problems. My goal ...
vinod pandey's user avatar
1 vote
0 answers
127 views

I'm exploring a machine learning research direction and I'm looking for ideas or pointers to existing models/projects that fit the following setup: The model is trained on graphs with edge information ...
lili's user avatar
  • 361
1 vote
0 answers
29 views

I am building a time-series forecasting model using OLS. For preparation, I am making all series stationary (for now). What I don't understand: Series should be stationary Achieving stationarity can ...
aze45sq6d's user avatar
  • 145
0 votes
0 answers
31 views

I'm working on a machine learning project aimed at automatically predicting dependency links between tasks in industrial maintenance procedures in a group of tasks called gamme. Each gamme consists of ...
lili's user avatar
  • 361
4 votes
2 answers
144 views

Editing to add one key information ( df and dailyRet ), which I noticed how imp it is... after solving this issue. ...
Vineet Tripathi's user avatar
1 vote
0 answers
44 views

I'm currently exploring a couple of statistical forecasting methods. My problem rose when using VARMA(2,1) fixed window rolling forecast. The example code that I'm using is the following: Here I only ...
Silvio Klenk's user avatar
3 votes
1 answer
112 views

I'm trying to develop a model who'd be able to generate dependencies between industrial tasks. In order to do that, i went for the GNN solution : i have nodes = tasks, dependencies = edges, and have ...
lili's user avatar
  • 361
0 votes
1 answer
37 views

I have been trying to code an upscaling gan but while the code run, I pretty much always end up with terrible result when the gan doesn't collapse, collapse which happen often. I previously tried to ...
Freeziey's user avatar
2 votes
0 answers
48 views

I am currently training a graph transformer model in order to develop an AI who'd be able to generate edges on a unseen graph (link dependencies between text with historical data). I divided my ...
lili's user avatar
  • 361
1 vote
0 answers
40 views

I am trying to train a GNN but am getting a NaN loss function immediately after the first training example. Below I have included all of the pertinent code. My input is 385 points in 3D space confined ...
Will Borrelli's user avatar
1 vote
1 answer
56 views

My requirement: Need to extract license plates without duplicates and store images in a folder,then apply ocr to extract text from images. What i have achieved: Iam able to detect license plates ...
Raj's user avatar
  • 11
1 vote
0 answers
38 views

I'm currectly trying to train a model in order to predict dependencies between text, here it's industrials tasks, based on historical data. The goal is to learn that "Task A precedes Task B for ...
lili's user avatar
  • 361
2 votes
1 answer
129 views

I'm currently working on a Parallel and Distributed Computing project where I'm comparing the performance of both XGBoost and CatBoost when trained on CPU vs GPU. The goal is to demonstrate how GPU ...
Mxneeb's user avatar
  • 21
3 votes
0 answers
46 views

We are predicting conversion. Conversion means customer converted from paying one-off to paying regular (subscribe) If one feature is categorical feature "Activity" , consisting 15+ ...
user30388975's user avatar
3 votes
0 answers
43 views

I have a 11GB .nc file with lon/lat positions, and particle trajectories on the ocean surface for a timespan of 40 days. For small files (Approx 140MB) i use xarray, netCDF4, matplotlib and cartopy to ...
otk's user avatar
  • 141
8 votes
1 answer
179 views

I'm working on an industrial AI use case where I train a Graph Neural Network (GCN) for link prediction — specifically, to predict successor tasks in project planning graphs (e.g., for construction or ...
lili's user avatar
  • 361
2 votes
0 answers
36 views

I am looking into different statistical methods for determining a decrease in a numeric "count" feature across a time-series dataset. The dataset is relatively small (about 50 records), and ...
Mar's user avatar
  • 165
4 votes
0 answers
27 views

I am doing a geospatial assessment integrated with ML modeling. The problem is the very low accuracy percentage, as more training features increases, it gets lower. What could be the solution to such ...
Reem 's user avatar
  • 41
1 vote
0 answers
37 views

I am using sklearn's Isolation Forest as a model to detect anomalies. My dataset is relatively small, 50 records with only 2-3 features. To prevent any overfitting, what would you recommend to tune ...
Mar's user avatar
  • 165
2 votes
0 answers
41 views

I came across a (short and curt) technical report that claims to be SOTA on keyword spotting, but it didn't share its code and had a very short explanation of its network. I implemented the model, but ...
FloopyBeep's user avatar

1
2 3 4 5
133