Newest 'pandas' Questions - Page 5

0 votes

1 answer

83 views

How to reference a second Pandas dataframe to the first one without creating any copy of the first one?

I have a large pandas dataframe df of something like a million rows and 100 columns, and I have to create a second dataframe df_n, same size as the first one. Several rows and columns of df_n will be ...

MBlrd

165

asked Aug 8 at 20:19

5 votes

1 answer

303 views

Why is the Panda's apply function so slow when iterating over an entire row, rather than a specific column? [duplicate]

My intuition when using Pandas is that, if you have to use df.apply, it would be more optimal to group all the apply operations into one call. This was further reinforced by me learning that NumPy ...

v0rtex

53

asked Aug 7 at 20:23

2 votes

2 answers

101 views

Undocumented pandas DataFrame shuffle() [closed]

The following seems to work: import pandas as pd import sklearn df = sklearn.datasets.load_iris() df = pd.DataFrame(df.data, columns=df.feature_names) df.shuffle() However this shuffle function seems ...

robertspierre

5,386

asked Aug 7 at 10:18

3 votes

1 answer

184 views

How to create filled and stacked x y scatter plot with data from multiple rows and columns of data in dataframe

I'm working in Jupyter notebooks trying to build a stacked and filled x,y scatter bar chart from the dataframe (df_xy_columns) below: sum_y_gran PVR Group min_x min_x2 max_x max_x2 min_y ...

rfulks

33

asked Aug 7 at 3:05

4 votes

1 answer

168 views

Why does this Python script crash?

I have the following script that crashes when I run it and I cannot figure out why. The script is a smaller version of a larger script, but still reproduces the error of the larger script. import ...

Claude Simon

49

asked Aug 6 at 19:50

9 votes

5 answers

371 views

Convert (many) integer-valued rows into binary indicator columns using Pandas

I am working on a task that seems to me a little like one-hot encoding, but notably different. What I want to do is take a row of integers from a Pandas DataFrame and produce a binary column with 1's ...

lane-h-rogers

227

asked Aug 4 at 22:31

2 votes

2 answers

147 views

Am I correctly generating a list of randomly assigned pairs with exclusions in python?

I have an array of names and roles of people within a company: Example array: names_and_titles = [ ("Samantha Reyes", "Innovation", "Product Owner"), ("Ethan ...

Imam

41

asked Aug 4 at 11:50

7 votes

2 answers

230 views

Convert Decimal values to float64 when creating a Pandas DataFrame

I'm working with a dictionary that contains a list of decimal.Decimal values as one of its fields: import pandas as pd from decimal import Decimal data = { 'Item': ['Apple', 'Banana', 'Orange'], ...

Gino

913

asked Aug 4 at 11:23

1 vote

1 answer

117 views

pandas pivot_table: can aggfunc work over a different grouping period from the table? [closed]

I have a pandas pivot table that shows payments made to different payees vs date, and I'm using a Grouper to group them into months, e.g.: payee payee_1 payee_2 date 2019-11-30 amount ...

Paul Worrall

13

asked Aug 4 at 0:34

-1 votes

1 answer

70 views

putting looped API Call results into a dataframe in Python

i need some help. have got a part of a python script which accesses a url field in a sql database, and then calls an api based using the url in the field. Now i cannot get the data into a dataframe to ...

Trevor Turn

1

asked Aug 1 at 17:16

0 votes

1 answer

126 views

pd.api.types.is_string_dtype() is misleading

df = pd.DataFrame({ 'col_str': ["a", "b", "c"], 'col_lst_str': [["a", "b", "c"], ["d", "e", "f"], [&...

Alexis

1,663

asked Aug 1 at 14:08

4 votes

3 answers

136 views

Why is pandas not formatting dates with date_format?

Why is pandas not formatting dates with date_format argument of to_csv? pandas.DataFrame([datetime.datetime.now().date()]).to_csv(date_format="%Y %b") ',0\n0,2025-07-31\n'

Hugo Trentesaux

2,101

asked Jul 31 at 13:41

1 vote

1 answer

138 views

How to replace existing data in a particular sheet of an existing excel file using pyspark dataframe?

I am using Azure Databricks and Azure Data Storage Explorer for my operations. I have an excel file of under 30 MB containing multiple sheets. I want to replace the data in one sheet every month when ...

spacestar

21

asked Jul 31 at 8:16

-3 votes

1 answer

87 views

How to convert sql formula to python or pandas code [closed]

I have a syntax like below and would like to convert this to python executable statement. The below is stored as it is in the database and used in a procedure for calculating the required value. Now I ...

Jayanth

9

asked Jul 31 at 7:43

4 votes

5 answers

297 views

How to merge two CSV files based on matching values in different columns and keep unmatched rows with placeholders?

I'm working on a data cleaning task and could use some help. I have two CSV files with thousands of rows each: File A contains product shipment records. File B contains product descriptions and ...

user21677098

asked Jul 31 at 3:25

4 votes

4 answers

175 views

How to fill values in a Dataframe depending on values around it

I have a dataframe that looks something like this: 1 2 3 'String' '' 4 X '' '' 5 X '' '' 6 7 'String' '' 1 Y '' And I want to change the Xs and Ys (put here just to visualize) to the ...

Lucas P

47

asked Jul 30 at 15:19

1 vote

2 answers

134 views

In pandas, how to write the word "nan" as string with to_excel?

I have the reverse problem as described in Prevent pandas from interpreting 'NA' as NaN in a string. I work with older English text data and want to write the word "nan" (i.e. Modern ...

Mat

525

asked Jul 29 at 12:08

6 votes

5 answers

326 views

How to generate this simple dataframe from these numbers?

I have N numbers, call it 3 for now: A1, A2, A3. I'd like to generate the following dataframe in Pandas: Category 1 2 3 4 5 6 7 1 A1 A1+A2 A1+A2+A3 A2+A3 A3 0 0 2 0 A2 A2+A3 A2+A3+A1 A3+A1 A1 0 3 0 0 ...

Matta

207

asked Jul 29 at 10:37

-2 votes

2 answers

186 views

Why grouping a pandas series using the same series makes no sense?

In the code example below I am grouping a pandas series using the same series but with a modified index. The groups in the end make no sense. There is no warning or error. Could you please help me ...

karpan

597

asked Jul 29 at 9:24

2 votes

2 answers

93 views

Pandas dt accessor or groupby function returning decimal numbers instead of integers in index labels where some series values NA

We're trying to group up date counts by month and index values are returning as decimals instead of integers when series contain any number of NaTs / na values. Simplified reproducible example: import ...

Chris Dixon

1,148

asked Jul 29 at 3:54

1 vote

0 answers

51 views

How to call R's stlm() from Python using rpy2, getting "missing value where TRUE/FALSE needed" error

I’m using rpy2 in Python to call R's forecast::stlm() function from within a custom wrapper function defined in R. My goal is to fit a seasonal time series model (STL + ARIMA) on a univariate time ...

RSK

765

asked Jul 28 at 15:45

0 votes

2 answers

77 views

How can I change the shape of the dataframe to have two headers when I have duplicated values? [duplicate]

this is my df: symbol year_bin metric value row 0 USA500.IDX 2025-1 total_trades 32.00 0 1 GBPUSD 2025-1 total_trades 11.00 0 2 GBPUSD 2025-1 ...

Amir

3

asked Jul 27 at 10:04

2 votes

1 answer

91 views

Update values of specific columns of df2 in df1 using Pandas

I have 2 dataframes. One is small with lesser columns of the other one. I want to update df1 with values from the available columns in df2. How do I do it? Eg: df1: Jan Feb Mar Apr May Jun Jul Aug ...

Anupkumar Kasi

181

asked Jul 27 at 7:33

1 vote

1 answer

175 views

Altair fails to render chart out of pandas dataframe on Streamlit

I have the following code in Python, using Streamlit as framework: try: native_data = data.copy() # Create Altair chart with native data st.write(f"Debug: Native data type: {type(...

HuLu ViCa

5,515

asked Jul 26 at 16:01

1 vote

3 answers

90 views

Can't return graph to website using Flask and HTML

This file is called 'html app.py' from flask import Flask, render_template, request import yfinance as yf import seaborn as sns import matplotlib.pyplot as plt import io import base64 app = Flask(...

rashmip_21

53

asked Jul 26 at 12:47

1 vote

1 answer

79 views

How do I create hourly means with Pandas only when I have at least half of the data points?

I have a Pandas dataframe df with a datetime index and three columns, like this: Out[64]: rh pm25a pm25b time_stamp 2022-07-06 11:35:...

ValeA

11

asked Jul 26 at 4:34

1 vote

1 answer

65 views

Fit the rows and column names using pandas.set_option

I am trying to use pandas.set_option for my python script to display a table but some how the data does not fill properly in an html page Since the names in some column are bit longer , columns look 1 ...

Kapil

325

asked Jul 25 at 15:05

1 vote

1 answer

111 views

Pandas dataframe insert to SQL Server using pyodbc fails if more than 1 record is present in batch

I have a large dataframe which I need to upload to SQL server. Due to volume of data, my code does the insert in batches. But, I am facing insert failure if the batch has more than 1 record in it. The ...

Abhishek Sourabh

110

asked Jul 25 at 12:30

3 votes

1 answer

68 views

Assign column status retrospectively in pandas

I have created the following pandas dataframe: import pandas as pd import numpy as np ds = {'col1' : [234,321,284,286,287,300,301,303,305,299,288,300,299,287,286,280,279,270,269,301]} df = pd....

Giampaolo Levorato

1,762

asked Jul 25 at 8:24

2 votes

1 answer

71 views

How to set up non-linear (sinusoidal) multiple variable regression problems for tensorflow?

I have some parameters: A1, A2, A3, f1, f2, f3. These parameters are then used to generate a set of sinusoidal data, something like: y = A1 * sin(f1 * x) + A2 * sin(f2 * x) + A3 * sin(f3 * x) From ...

PentaGeer Joshua Meetsma

21

asked Jul 24 at 15:28

0 votes

2 answers

83 views

Pandas groupby with Grouper still includes time bins beyond my filtered range

I'm working with 5-min level data that only includes timestamps between 09:30 and 16:00. (dateTime is saved as column not as index) after applying operation to the group, I get additional data beyond ...

JoonHak Kim

1

asked Jul 24 at 5:42

0 votes

1 answer

73 views

Why isn't this removing non-alphanumerical characters?

import pandas as pd df = pd.read_csv('911.csv') df['desc'].str.replace('[^a-zA-Z0-9]','').head() 0 REINDEER CT & DEAD END; NEW HANOVER; Station ... 1 BRIAR PATH & WHITEMARSH LN; ...

david yen2

3

asked Jul 24 at 1:15

1 vote

0 answers

62 views

Parse a CSV translation file that contains "None" as a standalone string [duplicate]

I am working on a large CSV file that contains number IDs for translations followed by entries for different languages. These entries represent localization strings in an application. I was tasked ...

Hadi Farah

1,180

asked Jul 23 at 12:14

5 votes

4 answers

182 views

How to cleanup some content from the text file

I have the following data in a CSV. "ID","OTHER_FIELDS_2" "87","25 R160 22 13 E" "87","25 R165 22 08 E" "77","" &...

learner

53

asked Jul 22 at 20:21

0 votes

1 answer

77 views

Python: How to use changing window in pandas rolling groupby function

I have a DataFrame with monthly data that looks something like this: id date window_in_months value 1 2000-01-01 3 20 1 2000-02-01 3 30 2 2000-01-01 12 40 2 2000-02-01 12 60 I want to do a rolling ...

LattePrincess

95

asked Jul 21 at 17:19

0 votes

1 answer

100 views

Getting some columns as raw data while others converted to pandas types

Is there a way in KDB/pykx to get only some columns as raw data while get others converted to pandas types? In the example below, I want to be able to do what is shown in the last line (for variable ...

S.V

2,855

asked Jul 21 at 14:43

4 votes

2 answers

171 views

How do I get non-aggregated columns using groupby in Pandas? [closed]

I have a sample data frame like this: Id application is_a is_b is_c reason subid record 100 app_1 False False False test1 4 record100 100 app_2 True False False test2 3 ...

N9909

297

asked Jul 20 at 13:49

4 votes

1 answer

131 views

Can't make candle chart due to some error with mpf.plot

import pandas as pd import yfinance as yf import mplfinance as mpf df = yf.download('AMZN', start='2020-01-01', end='2025-07-31') print(df) mpf.plot(df['2020-01-01':'2020-06-01'], type='candle', ...

rashmip_21

53

asked Jul 19 at 18:37

1 vote

4 answers

113 views

How to generate only one box plot for a matrix in Pandas?

This code generates 4 separate box plots. How can i generate only one box plot for the entire matrix? import numpy as np import pandas as pd data = np.random.random(size=(4,4)) df = pd.DataFrame(data) ...

stefaniecg

87

asked Jul 18 at 9:09

1 vote

1 answer

108 views

Why is my plotly.graph_objects.Bar graph displaying increments of one rather than the values in my pandas DataFrame?

I have workouts logged in JSON like this: [ { "date": "2025-07-14", "workout_name": "Lower", "exercises": [ { "name&...

Stilvens Parm

19

asked Jul 18 at 8:45

0 votes

1 answer

77 views

pandas test if the datatype of an input series supports nan values

I had something like the following code using pandas 1.x that new generates a warning in pandas 2: import pandas as pd import numpy as np df1 = pd.DataFrame({"i":[1,2,3,4,5], "a":[...

guest

139

asked Jul 16 at 20:04

0 votes

1 answer

59 views

Comparing 2 Columns to determine if higher or lower

trying to Compare 2 Columns lag2open to MGC=F and return if it is higher and returning it as Higher than 0 using GCClose["Higher than 0"] = [GCClose.columns[1]]>= [GCClose.columns[0]] it ...

Rafael Alexandre Sousa

39

asked Jul 16 at 16:59

0 votes

1 answer

209 views

Best way to convert FastAPI/SQLmodel into Polars Dataframe?

What is best way to convert a FastAPI query into a Polars (or pandas) dataframe. Co-pilot give this. with Session(engine) as session: questions = session.exec(select(Questions)).all() ...

diogenes

2,181

asked Jul 16 at 7:14

0 votes

1 answer

138 views

Polars.write_excel: How to remove thousand separator for i64 & f64 and remove trailing zero for f64 efficiently?

SOLUTION as of 16JUL25: See rotabor's float_precision answer for trailing zero problem. To solve thousands separator problem gracefully without unnecessary steps, do NOT bother using polars....

JeffCh

97

asked Jul 16 at 4:46

2 votes

1 answer

99 views

Replace all non-empty strings in a column with a constant

I have a data frame with a variety of string values. For a given column, if there is any string entered, I would like to replace it with the same value (say 'fruit'). Example: data = {'item_name': ['...

Liz

365

asked Jul 15 at 17:16

2 votes

1 answer

71 views

How to identify price regimes / trends in Pandas

I have created the following pandas dataframe, which is an example of 26 stock prices (Open, High, Low, Close): import pandas as pd import numpy as np ds = { 'Date' : ['15/06/2025','16/06/2025','17/...

Giampaolo Levorato

1,762

asked Jul 15 at 13:27

2 votes

1 answer

67 views

Indicating which column wins in a df.min() call [duplicate]

I want to find the minimum value per row and create a new column indicating which of those columns has the lowest number. Unfortunately, it seems like pandas isn't immediately able to help in this ...

Corsaka

464

asked Jul 15 at 10:06

1 vote

1 answer

123 views

How to do pandas grouping, filtering, and a pie chart using chained operations?

How can I simplify the code below and make it more efficient using chained operations? Currently, I am creating intermediate objects and using a for loop. I use this data: https://www.kaggle.com/...

just a tw highschooler

21

asked Jul 15 at 8:54

-1 votes

4 answers

262 views

How can I remove the brackets and parentheses at the end of words?

My task is to parse the protein names by removing the brackets and parentheses in the row. In short, I want to retain the words in front of any parentheses and brackets. Note that I need to keep ...

Ssong

466

asked Jul 15 at 5:14

1 vote

1 answer

102 views

fill row value based on column identifier

I have a dataframe like below, and would like to fill value from previous row value based on id field, so, any record with 4 in colA get what the previous colA=3 records' colC value, colB stays the ...

Connie Xu

63

asked Jul 14 at 21:12

Collectives™ on Stack Overflow