Newest 'pandas' Questions - Page 2

4 votes

0 answers

141 views

How to read mixed data (numbers and strings) from Excel with xlwings without converting 123 → 123.0

I’m using xlwings to read an Excel sheet into a pandas DataFrame with the built-in pd.DataFrame converter. Some of my columns contain mixed data (e.g. IDs or codes like 123, 00123, ABCD). When I read ...

Dynamicgra d

49

asked Oct 31 at 11:45

5 votes

1 answer

85 views

Plotly px.timelines shows 1970 even though print(data.dtypes) confirms correct datetime in shiny

I'm building a dashboard in Shiny for Python and I'm stuck on a strange bug. I have a Plotly px.timeline that should display boiler on-times based on a date range from a Flatpickr input. The Problem: ...

Juan Siécola

97

asked Oct 30 at 19:45

2 votes

1 answer

102 views

How to fix "AttributeError: 'Series' object has no attribute 'codes'" using pandas.Categorical

I am trying to convert a string that is a categorical data type into a numeric. I found out that I can use pandas.Categorical, unfortunately, accessing the codes attribute give me an error. Here is a ...

JA-pythonista

1,375

asked Oct 30 at 14:32

3 votes

2 answers

103 views

How to generate scatter plot of all numeric columns against specific columns in the same dataframe

I have a dataframe with a mix of data types (object and numeric). I want to plot a scatter plot for all numeric columns in the dataset against specific columns: col_32, col_69,col_74 and col_80 ...

RayX500

319

asked Oct 30 at 6:18

2 votes

3 answers

133 views

Transform DataFrame containing ID pairs into a list of sets

I have a Pandas DataFrame with the following structure left_id right_id a b c a x y I need to transform this into a list of sets, like [ {'a', 'b', 'c'}, {'x', 'y'} ] the first two rows should be ...

Joe F.

917

asked Oct 29 at 1:57

1 vote

3 answers

199 views

Pandas DataFrame with a hundred million entries and counting the number of identical characters in strings

I have a pandas DataFrame (df) with two columns (namely Tuple and Set) and approximately 100,000,000 entries. The Tuple column data is a string of exactly 9 characters. The Set column data is an ...

Max Pierini

2,323

asked Oct 28 at 20:21

4 votes

1 answer

90 views

Execution of pandas' info in python

I am new to the pandas library in python. When I loaded a file and was printing the output of df.info into the console, the data is getting printed first instead of the text that I have printed. What ...

UnemployedBrat

65

asked Oct 28 at 14:11

3 votes

1 answer

98 views

Should I drop duplicates before merging two DataFrames or after the merge?

I have two DataFrames in pandas: customers and flights. Both contain duplicates on the join key (Loyalty#). I am not sure whether the correct workflow is to remove duplicates before the merge or merge ...

Teexlol

31

asked Oct 28 at 11:07

0 votes

0 answers

37 views

How can I automatically export multiple Excel sheets into separate CSV files using Python? [duplicate]

I’m trying to automate a report generation task using Python. I have an Excel workbook that contains multiple sheets (e.g., "Sales", "Orders", "Summary"), and I want to ...

whdaks1019

1

asked Oct 28 at 8:40

1 vote

1 answer

65 views

Output of for loop filling down in dataframe instead of returning corresponding values for each row

I'm using SpaCy to process a series of sentences and return the five most common words in each sentence. My goal is to store the output of that frequency analysis (using Counter) in a column beside ...

cmr

23

asked Oct 28 at 6:25

4 votes

3 answers

207 views

Why do I get a SettingWithCopyWarning when using shift and dropna inside a function?

In general, when I receive this warning /home/mo/mwe.py:7: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value ...

Mo_

2,080

asked Oct 27 at 17:07

3 votes

1 answer

93 views

How to create a column based on 2 other columns of the dataframe?

I have a kivy app and at some point of the code a pandas dataframe loaded from excel and I managed to create already 2 columns filled with booleans. I need to create a third columns which content ...

user31746640

33

asked Oct 24 at 16:46

0 votes

1 answer

109 views

Combining two dataframes and keeping the average

I'm new to coding, and I'm trying to combine the data from two weather stations into one new dataframe sorted by Datetime. I want this new dataframe to contain the average values of the two original ...

Maurice Verest

1

asked Oct 23 at 16:09

2 votes

0 answers

75 views

How do you get specific data in unknown row from a csv file using known data from the same row? [duplicate]

import geopy # used to get location from geopy.geocoders import Nominatim import pandas as pd from pyproj import Transformer def get_user_location(): # user location geolocator = Nominatim(...

Litcoder

21

asked Oct 23 at 15:11

1 vote

1 answer

139 views

Pandas's to_datetime function and datetime format

I have a .csv file with two columns (Date and Time). The time zone is "Europe/Paris" with a +02:00 hours shift. The file is structured in 2 parts with two datetime formats. Date Time 08-11-...

RémyClaverie

77

asked Oct 23 at 13:11

4 votes

3 answers

148 views

Creating a new pandas dataframe from shape

I have information on total number of rows and number of columns for a new pandas dataframe import pandas as pd nRow = 10 nCol = 4 Based on this information I want to create a new dataframe where ...

Brian Smith

1,679

asked Oct 21 at 17:33

2 votes

0 answers

91 views

How can I write a matrix or a pandas DataFrame to an Excel file using openpyxl, without iterating cell by cell? [duplicate]

I’d like to insert a 2D array (for example, a pandas DataFrame) into an existing Excel worksheet at a specific position (e.g., starting at cell M8), using openpyxl. Is there a way to assign the whole ...

Amadou

21

asked Oct 21 at 14:46

1 vote

1 answer

44 views

pandas.read_csv uses only utf-8 encoding for django file upload

I'm testing django using file uploads. Was facing a strange issue, when despide which encoding I choose, I'm always getting same error message that pandas is trying to decode with UTF-8 pd.read_csv(...

Aidas

170

asked Oct 21 at 14:29

-5 votes

1 answer

98 views

pandas crosstab with string as second parameter

Is this code, which works, supposed to work? import pandas as pd from palmerpenguins import load_penguins penguins = load_penguins() pd.crosstab(penguins.species, "count") species count ...

robertspierre

5,386

asked Oct 21 at 13:10

6 votes

2 answers

200 views

How to drop duplicate values when merging dataframes

I have a DataFrame that I want to merge and drop only duplicates values based on column name and row. For example, key_x and key_y has the same values in the same row in row 0,3,10,12,15. My DataFrame ...

Chris

63

asked Oct 19 at 23:01

0 votes

0 answers

61 views

pywebview: error maximum recursion depth exceeded before pressing button when passing pandas/model objects in js_api

I’m embedding a small UI with pywebview and want Python to JS live updates. I created a GPSSpoofingDetector class that loads a pickled sklearn model and a pandas test CSV. I want a JavaScript “Start” ...

Ahsan914

5

asked Oct 19 at 15:14

1 vote

1 answer

112 views

Combining Identically Indexed and Column Dataframes into 3d Dataframe

I have 3 2D DataFrames, all with identical indexes (datetime range) and column names, but different data for these labels. I would like to combine these three 2D dataframes into 1 3D DataFrame with an ...

cma0014

1,599

asked Oct 18 at 17:14

1 vote

1 answer

96 views

Calculating MultiIndex intersection to a given tolerance in an efficient way

I have two DataFrames, data1 and data2, with 3-level multiindices. The first two levels are floats, and correspond to spatial coordinates (say longitude and latitude). The third level, time, is based ...

peich

33

asked Oct 18 at 16:37

3 votes

1 answer

105 views

Calculating mean for a column of arrays in pandas

I have below pandas dataframe import pandas as pd import numpy as np dat = pd.DataFrame({ 'A': [1,2,3], 'B': [[[np.nan, 0.0, 0.0, 0.0, 0.0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], ...

Bogaso

3,896

asked Oct 18 at 6:51

0 votes

0 answers

103 views

"UserWarning: No artists with labels found to put in legend" Error when trying to create a legend with labels from dataset

I can't get legend labels to show up when I use 'CONDITION' (a longer string) as my x data and hue, however when I use CONDITION_N (a shorter string) as the hue then it appears. Why? Warning: /var/...

Sarah Warner

3

asked Oct 17 at 19:07

1 vote

1 answer

72 views

How can I use wildcard paths from a Pandas dataframe as required rule inputs and outputs in Snakemake?

I have a Snakemake pipeline (https://github.com/V-Varga/SPOT-BGC/tree/main), where I generate input and output file names for various intermediate steps using wildcards that refer back to file and ...

Vi_Varga

25

asked Oct 16 at 18:42

0 votes

2 answers

176 views

Error due to single-level dataframe merge with multi-level indexed dataframe

# Read lookup file which only contains 5 columns. df_lookup = pd.read_excel( os.path.join(path, 'lookup.xlsx'), index_col=[0, 1, 2, 3, 4]) # sample df_lookup # |A |B |C |D |E | # |--|--|--|--|...

mk_

27

asked Oct 16 at 14:29

2 votes

0 answers

105 views

How to control the zorder values on superimposed bars in a histogram plot in matplotlib

I have a list of three dataframes, each of them having four columns of interest. I want to create a figure with four subplots (one for each column). In each subplot, first, I want to create a ...

Arindam Das

31

asked Oct 15 at 14:22

5 votes

2 answers

147 views

Failing to fill empty date values with numpy nan

I have below code import pandas as pd import numpy as np dat = pd.DataFrame({'A' : [1,2,3,4,5], 'B' : ['2002-01-01', '2003-01-01', '2004-01-01', '2004-01-01', '2005-01-01']}) dat['B'] = pd.to_datetime(...

Brian Smith

1,679

asked Oct 15 at 9:28

8 votes

1 answer

256 views

How to write a pandas-compatible, non-elementary expression in narwhals

I'm working with the narwhals package and I'm trying to write an expression that is: applied over groups using .over() Non-elementary/chained (longer than a single operation) Works when the native df ...

Slash

581

asked Oct 14 at 19:07

4 votes

4 answers

173 views

Create an incremental suffix for values in a pandas column that have duplicate values in another column

Setup I have a dataframe, df import pandas as pd df = pd.DataFrame( { 'Name':['foo','foo','foo','bar','bar','bar','baz','baz','baz'], 'Color':['red','blue','red','green','green','...

bismo

1,645

asked Oct 14 at 18:49

0 votes

1 answer

159 views

Index in to two specific dates on Pandas dataframe [closed]

I have a pandas dataframe where the index is datetime. I learned that I can index in to a specific date using this code: selected_date_df = df.loc['yyyy-mm-dd'] I can also find data between two dates ...

GC123

411

asked Oct 14 at 15:47

3 votes

2 answers

207 views

How to change names of pandas MultiIndex using Styler

Let's assume we have the following: midx = pd.MultiIndex.from_product( [[0, 1], [0, 1], [0, 1]], names=['L1', 'L2', 'L3']) df = pd.DataFrame({"col": list(range(8))}, index=midx) Now,...

MarcoS

13.6k

asked Oct 14 at 12:17

3 votes

1 answer

132 views

Increase the date by number of months in pandas

I have below pandas data frame import pandas as pd import numpy as np dat = pd.DataFrame({'A' : [1,2,3,4,5], 'B' : ['2002-01-01', '2003-01-01', '2004-01-01', '2004-01-01', '2005-01-01']}) dat['A'] = ...

Brian Smith

1,679

asked Oct 14 at 2:44

0 votes

2 answers

105 views

Why is there a duplicate index when using sort_index() in pandas?

I am doing target mean mapping based on an external statistical table, where org_ is the external data and merged_data is the set of training data and test data. After processing, the features of ...

osquer kkzlk

1

asked Oct 13 at 11:37

3 votes

2 answers

110 views

Is there a way in Python to make a row of an HTML table multi-lined?

I have a Python script that constructs a pandas DataFrame from API data, which I then convert to a pretty_html_table that will be the body of an email. In one of the rows, I have data containing an ...

MasterCal

31

asked Oct 13 at 4:11

1 vote

0 answers

58 views

How to efficiently denormalize a SQL DB to produce Parquet files

I'm trying to create a parquet file from a heavily normalized SQL database with a snowflake schema. Some of the dimensions have very long text attributes so that a simply running a big set of joins to ...

Davor Cubranic

1,150

asked Oct 11 at 18:19

1 vote

2 answers

106 views

How do I calculate a relative time delta in Pandas?

I have a column of datetimes and I want to get the difference between values in terms of years, months, etc, instead of timedeltas that only provide days. How do I do this in Pandas? Pandas provides ...

wjandrea

33.9k

asked Oct 11 at 15:59

2 votes

1 answer

86 views

How to combine multiple rows of Pandas dataframe into one row using a key [duplicate]

I am trying to manipulate a CSV using Pandas and I need to get the data into the format of one row per ID. This is an example of what I am trying to accomplish: From: df = pd.DataFrame({ 'ID': [1, 1, ...

sar

21

asked Oct 10 at 17:26

1 vote

1 answer

72 views

python plotly scatter ols trendline has a kink in it

I am using plotly express to model some data, and wanted to add a trendline = 'ols' to it. when I do, I obtain a kink in the result here is the code used: d={'category': {63: 'test', 128: 'test', 192:...

frank

3,816

asked Oct 10 at 6:30

0 votes

2 answers

86 views

Using .loc to change a value in a pd.Dataframe with a variable as column name

I need to change a value in a pd dataframe with .loc let show with an example : import pandas as pd df = pd.DataFrame(data={"A":["bla","bla2"],"B":[1,2]}) I ...

seb66

25

asked Oct 9 at 15:23

-1 votes

1 answer

59 views

Why does changing a DataFrame in one Jupyter cell also change another variable? [duplicate]

I am working in Jupyter Notebook with pandas, and I noticed something strange. In one cell , I did this: import pandas as pd df1 = pd.DataFrame({"A":[1,2,3]}) df2 = df1 Then in another ...

Gouri Phadnis

1

asked Oct 9 at 5:34

5 votes

2 answers

201 views

Why doesn't Pandas concat do a copy when one of the dataframes is empty?

Consider this example: import pandas as pd df_part1 = pd.DataFrame() df_part2 = pd.DataFrame({'A': [1,1], 'B': [3,4]}) df_concat_out = pd.concat([df_part1, df_part2]) print("id(df_part2.values) ==...

Ben Farmer

2,974

asked Oct 8 at 23:57

1 vote

1 answer

81 views

Reconfigure a Pandas Dataframe [duplicate]

Our old ERP system generates orphaned HTML reports with the following format which I import into Pandas Work Order Item Type Material Labor 0 552603 Budget 71119 4567 1 552603 ...

Woody 1470

11

asked Oct 8 at 22:26

3 votes

1 answer

135 views

Why does Pandas not recognise my sqlalchemy connection engine?

I'm trying to connect to an IBM DB2 database from Python. I'm using Python 3.12.10, SQLAlchemy 1.4.54, and Pandas 2.3.2. This is what my code looks like: import os import sqlalchemy import pandas as ...

SRJCoding

521

asked Oct 8 at 11:19

3 votes

2 answers

155 views

Importing a table from a webpage as a dataframe in Python

I am trying to read in a specific table from the US Customs and Border Protection's Dashboard on Southwest Land Border Encounters as a dataframe. The url is: https://www.cbp.gov/newsroom/stats/...

Ari

2,023

asked Oct 7 at 22:56

1 vote

1 answer

74 views

How to count pandas DataFrame cells using unique cell values as rows and columns [duplicate]

Let's say I have a DataFrame df like this: pd.DataFrame({'Planet':['Planet_1','Planet_1','Planet_2','Planet_2','Planet_3','Planet_3'],'FeatureType':['Lake','Lake','Crater','Volcano','Lake','Canyon'],'...

Zane Bradley

13

asked Oct 6 at 23:36

-1 votes

1 answer

84 views

How to pivot a Pandas dataframe into the desired format? [closed]

I have the following data in a dataframe: Product t_Proj CFType1 CFType2 CFType3 0 Product1 0 270 193 130 1 Product1 1 233 197 362 2 Product1 2 130 278 375 3 Product1 3 ...

DC00107

3

asked Oct 6 at 17:54

1 vote

0 answers

123 views

Plotly - add dropdown list or buttons to OHLC / Candlestick graph

I have the following dataframe: lst = [['10/01/2025 8:30:00', 2.74, 2.87, 2.60, 2.65, 14, 'SPXW251001P06590000', 'P', 6590], ['10/01/2025 8:31:00', 2.80, 2.80, 2.50, 2.53, 61, '...

Dan

111

asked Oct 6 at 2:16

1 vote

1 answer

124 views

How to efficiently handle lookups between resampled and original DataFrames in Pandas?

I am building a backtesting project in Python using Pandas. I have: A large tick / 1-minute level DataFrame (df) with full market data. A 15-minute interval DataFrame (df_15) created from it using ...

its m

49

asked Oct 2 at 14:07

Collectives™ on Stack Overflow