289,232 questions
0
votes
1
answer
83
views
How to reference a second Pandas dataframe to the first one without creating any copy of the first one?
I have a large pandas dataframe df of something like a million rows and 100 columns, and I have to create a second dataframe df_n, same size as the first one. Several rows and columns of df_n will be ...
5
votes
1
answer
303
views
Why is the Panda's apply function so slow when iterating over an entire row, rather than a specific column? [duplicate]
My intuition when using Pandas is that, if you have to use df.apply, it would be more optimal to group all the apply operations into one call. This was further reinforced by me learning that NumPy ...
2
votes
2
answers
101
views
Undocumented pandas DataFrame shuffle() [closed]
The following seems to work:
import pandas as pd
import sklearn
df = sklearn.datasets.load_iris()
df = pd.DataFrame(df.data, columns=df.feature_names)
df.shuffle()
However this shuffle function seems ...
3
votes
1
answer
184
views
How to create filled and stacked x y scatter plot with data from multiple rows and columns of data in dataframe
I'm working in Jupyter notebooks trying to build a stacked and filled x,y scatter bar chart from the dataframe (df_xy_columns) below:
sum_y_gran PVR Group min_x min_x2 max_x max_x2 min_y ...
4
votes
1
answer
168
views
Why does this Python script crash?
I have the following script that crashes when I run it and I cannot figure out why. The script is a smaller version of a larger script, but still reproduces the error of the larger script.
import ...
9
votes
5
answers
371
views
Convert (many) integer-valued rows into binary indicator columns using Pandas
I am working on a task that seems to me a little like one-hot encoding, but notably different. What I want to do is take a row of integers from a Pandas DataFrame and produce a binary column with 1's ...
2
votes
2
answers
147
views
Am I correctly generating a list of randomly assigned pairs with exclusions in python?
I have an array of names and roles of people within a company:
Example array:
names_and_titles = [
("Samantha Reyes", "Innovation", "Product Owner"),
("Ethan ...
7
votes
2
answers
230
views
Convert Decimal values to float64 when creating a Pandas DataFrame
I'm working with a dictionary that contains a list of decimal.Decimal values as one of its fields:
import pandas as pd
from decimal import Decimal
data = {
'Item': ['Apple', 'Banana', 'Orange'],
...
1
vote
1
answer
117
views
pandas pivot_table: can aggfunc work over a different grouping period from the table? [closed]
I have a pandas pivot table that shows payments made to different payees vs date, and I'm using a Grouper to group them into months, e.g.:
payee payee_1 payee_2
date
2019-11-30 amount ...
-1
votes
1
answer
70
views
putting looped API Call results into a dataframe in Python
i need some help. have got a part of a python script which accesses a url field in a sql database, and then calls an api based using the url in the field. Now i cannot get the data into a dataframe to ...
0
votes
1
answer
126
views
pd.api.types.is_string_dtype() is misleading
df = pd.DataFrame({
'col_str': ["a", "b", "c"],
'col_lst_str': [["a", "b", "c"], ["d", "e", "f"], [&...
4
votes
3
answers
136
views
Why is pandas not formatting dates with date_format?
Why is pandas not formatting dates with date_format argument of to_csv?
pandas.DataFrame([datetime.datetime.now().date()]).to_csv(date_format="%Y %b")
',0\n0,2025-07-31\n'
1
vote
1
answer
138
views
How to replace existing data in a particular sheet of an existing excel file using pyspark dataframe?
I am using Azure Databricks and Azure Data Storage Explorer for my operations. I have an excel file of under 30 MB containing multiple sheets. I want to replace the data in one sheet every month when ...
-3
votes
1
answer
87
views
How to convert sql formula to python or pandas code [closed]
I have a syntax like below and would like to convert this to python executable statement.
The below is stored as it is in the database and used in a procedure for calculating the required value.
Now I ...
4
votes
5
answers
297
views
How to merge two CSV files based on matching values in different columns and keep unmatched rows with placeholders?
I'm working on a data cleaning task and could use some help. I have two CSV files with thousands of rows each:
File A contains product shipment records.
File B contains product descriptions and ...
4
votes
4
answers
175
views
How to fill values in a Dataframe depending on values around it
I have a dataframe that looks something like this:
1 2 3 'String'
'' 4 X ''
'' 5 X ''
'' 6 7 'String'
'' 1 Y ''
And I want to change the Xs and Ys (put here just to visualize) to the ...
1
vote
2
answers
134
views
In pandas, how to write the word "nan" as string with to_excel?
I have the reverse problem as described in Prevent pandas from interpreting 'NA' as NaN in a string.
I work with older English text data and want to write the word "nan" (i.e. Modern ...
6
votes
5
answers
326
views
How to generate this simple dataframe from these numbers?
I have N numbers, call it 3 for now: A1, A2, A3. I'd like to generate the following dataframe in Pandas:
Category
1
2
3
4
5
6
7
1
A1
A1+A2
A1+A2+A3
A2+A3
A3
0
0
2
0
A2
A2+A3
A2+A3+A1
A3+A1
A1
0
3
0
0
...
-2
votes
2
answers
186
views
Why grouping a pandas series using the same series makes no sense?
In the code example below I am grouping a pandas series using the same series but with a modified index.
The groups in the end make no sense. There is no warning or error.
Could you please help me ...
2
votes
2
answers
93
views
Pandas dt accessor or groupby function returning decimal numbers instead of integers in index labels where some series values NA
We're trying to group up date counts by month and index values are returning as decimals instead of integers when series contain any number of NaTs / na values.
Simplified reproducible example:
import ...
1
vote
0
answers
51
views
How to call R's stlm() from Python using rpy2, getting "missing value where TRUE/FALSE needed" error
I’m using rpy2 in Python to call R's forecast::stlm() function from within a custom wrapper function defined in R. My goal is to fit a seasonal time series model (STL + ARIMA) on a univariate time ...
0
votes
2
answers
77
views
How can I change the shape of the dataframe to have two headers when I have duplicated values? [duplicate]
this is my df:
symbol year_bin metric value row
0 USA500.IDX 2025-1 total_trades 32.00 0
1 GBPUSD 2025-1 total_trades 11.00 0
2 GBPUSD 2025-1 ...
2
votes
1
answer
91
views
Update values of specific columns of df2 in df1 using Pandas
I have 2 dataframes. One is small with lesser columns of the other one. I want to update df1 with values from the available columns in df2. How do I do it?
Eg:
df1:
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
...
1
vote
1
answer
175
views
Altair fails to render chart out of pandas dataframe on Streamlit
I have the following code in Python, using Streamlit as framework:
try:
native_data = data.copy()
# Create Altair chart with native data
st.write(f"Debug: Native data type: {type(...
1
vote
3
answers
90
views
Can't return graph to website using Flask and HTML
This file is called 'html app.py'
from flask import Flask, render_template, request
import yfinance as yf
import seaborn as sns
import matplotlib.pyplot as plt
import io
import base64
app = Flask(...
1
vote
1
answer
79
views
How do I create hourly means with Pandas only when I have at least half of the data points?
I have a Pandas dataframe df with a datetime index and three columns, like this:
Out[64]:
rh pm25a pm25b
time_stamp
2022-07-06 11:35:...
1
vote
1
answer
65
views
Fit the rows and column names using pandas.set_option
I am trying to use pandas.set_option for my python script to display a table but some how the data does not fill properly in an html page
Since the names in some column are bit longer , columns look 1 ...
1
vote
1
answer
111
views
Pandas dataframe insert to SQL Server using pyodbc fails if more than 1 record is present in batch
I have a large dataframe which I need to upload to SQL server. Due to volume of data, my code does the insert in batches. But, I am facing insert failure if the batch has more than 1 record in it. The ...
3
votes
1
answer
68
views
Assign column status retrospectively in pandas
I have created the following pandas dataframe:
import pandas as pd
import numpy as np
ds = {'col1' : [234,321,284,286,287,300,301,303,305,299,288,300,299,287,286,280,279,270,269,301]}
df = pd....
2
votes
1
answer
71
views
How to set up non-linear (sinusoidal) multiple variable regression problems for tensorflow?
I have some parameters: A1, A2, A3, f1, f2, f3.
These parameters are then used to generate a set of sinusoidal data, something like:
y = A1 * sin(f1 * x) + A2 * sin(f2 * x) + A3 * sin(f3 * x)
From ...
0
votes
2
answers
83
views
Pandas groupby with Grouper still includes time bins beyond my filtered range
I'm working with 5-min level data that only includes timestamps between 09:30 and 16:00. (dateTime is saved as column not as index)
after applying operation to the group, I get additional data beyond ...
0
votes
1
answer
73
views
Why isn't this removing non-alphanumerical characters?
import pandas as pd
df = pd.read_csv('911.csv')
df['desc'].str.replace('[^a-zA-Z0-9]','').head()
0 REINDEER CT & DEAD END; NEW HANOVER; Station ...
1 BRIAR PATH & WHITEMARSH LN; ...
1
vote
0
answers
62
views
Parse a CSV translation file that contains "None" as a standalone string [duplicate]
I am working on a large CSV file that contains number IDs for translations followed by entries for different languages. These entries represent localization strings in an application. I was tasked ...
5
votes
4
answers
182
views
How to cleanup some content from the text file
I have the following data in a CSV.
"ID","OTHER_FIELDS_2"
"87","25 R160 22 13 E"
"87","25 R165 22 08 E"
"77",""
&...
0
votes
1
answer
77
views
Python: How to use changing window in pandas rolling groupby function
I have a DataFrame with monthly data that looks something like this:
id
date
window_in_months
value
1
2000-01-01
3
20
1
2000-02-01
3
30
2
2000-01-01
12
40
2
2000-02-01
12
60
I want to do a rolling ...
0
votes
1
answer
100
views
Getting some columns as raw data while others converted to pandas types
Is there a way in KDB/pykx to get only some columns as raw data while get others converted to pandas types?
In the example below, I want to be able to do what is shown in the last line (for variable ...
4
votes
2
answers
171
views
How do I get non-aggregated columns using groupby in Pandas? [closed]
I have a sample data frame like this:
Id application is_a is_b is_c reason subid record
100 app_1 False False False test1 4 record100
100 app_2 True False False test2 3 ...
4
votes
1
answer
131
views
Can't make candle chart due to some error with mpf.plot
import pandas as pd
import yfinance as yf
import mplfinance as mpf
df = yf.download('AMZN', start='2020-01-01', end='2025-07-31')
print(df)
mpf.plot(df['2020-01-01':'2020-06-01'], type='candle', ...
1
vote
4
answers
113
views
How to generate only one box plot for a matrix in Pandas?
This code generates 4 separate box plots.
How can i generate only one box plot for the entire matrix?
import numpy as np
import pandas as pd
data = np.random.random(size=(4,4))
df = pd.DataFrame(data)
...
1
vote
1
answer
108
views
Why is my plotly.graph_objects.Bar graph displaying increments of one rather than the values in my pandas DataFrame?
I have workouts logged in JSON like this:
[
{
"date": "2025-07-14",
"workout_name": "Lower",
"exercises": [
{
"name&...
0
votes
1
answer
77
views
pandas test if the datatype of an input series supports nan values
I had something like the following code using pandas 1.x that new generates a warning in pandas 2:
import pandas as pd
import numpy as np
df1 = pd.DataFrame({"i":[1,2,3,4,5], "a":[...
0
votes
1
answer
59
views
Comparing 2 Columns to determine if higher or lower
trying to Compare 2 Columns lag2open to MGC=F and return if it is higher and returning it as Higher than 0
using GCClose["Higher than 0"] = [GCClose.columns[1]]>= [GCClose.columns[0]] it ...
0
votes
1
answer
209
views
Best way to convert FastAPI/SQLmodel into Polars Dataframe?
What is best way to convert a FastAPI query into a Polars (or pandas) dataframe.
Co-pilot give this.
with Session(engine) as session:
questions = session.exec(select(Questions)).all()
...
0
votes
1
answer
138
views
Polars.write_excel: How to remove thousand separator for i64 & f64 and remove trailing zero for f64 efficiently?
SOLUTION as of 16JUL25:
See rotabor's float_precision answer for trailing zero problem.
To solve thousands separator problem gracefully without unnecessary steps, do NOT bother using polars....
2
votes
1
answer
99
views
Replace all non-empty strings in a column with a constant
I have a data frame with a variety of string values. For a given column, if there is any string entered, I would like to replace it with the same value (say 'fruit').
Example:
data = {'item_name': ['...
2
votes
1
answer
71
views
How to identify price regimes / trends in Pandas
I have created the following pandas dataframe, which is an example of 26 stock prices (Open, High, Low, Close):
import pandas as pd
import numpy as np
ds = {
'Date' : ['15/06/2025','16/06/2025','17/...
2
votes
1
answer
67
views
Indicating which column wins in a df.min() call [duplicate]
I want to find the minimum value per row and create a new column indicating which of those columns has the lowest number. Unfortunately, it seems like pandas isn't immediately able to help in this ...
1
vote
1
answer
123
views
How to do pandas grouping, filtering, and a pie chart using chained operations?
How can I simplify the code below and make it more efficient using chained operations? Currently, I am creating intermediate objects and using a for loop.
I use this data: https://www.kaggle.com/...
-1
votes
4
answers
262
views
How can I remove the brackets and parentheses at the end of words?
My task is to parse the protein names by removing the brackets and parentheses in the row.
In short, I want to retain the words in front of any parentheses and brackets.
Note that I need to keep ...
1
vote
1
answer
102
views
fill row value based on column identifier
I have a dataframe like below, and would like to fill value from previous row value based on id field, so, any record with 4 in colA get what the previous colA=3 records' colC value, colB stays the ...