289,232 questions
4
votes
0
answers
141
views
How to read mixed data (numbers and strings) from Excel with xlwings without converting 123 → 123.0
I’m using xlwings to read an Excel sheet into a pandas DataFrame with the built-in pd.DataFrame converter.
Some of my columns contain mixed data (e.g. IDs or codes like 123, 00123, ABCD).
When I read ...
5
votes
1
answer
85
views
Plotly px.timelines shows 1970 even though print(data.dtypes) confirms correct datetime in shiny
I'm building a dashboard in Shiny for Python and I'm stuck on a strange bug. I have a Plotly px.timeline that should display boiler on-times based on a date range from a Flatpickr input.
The Problem:
...
2
votes
1
answer
102
views
How to fix "AttributeError: 'Series' object has no attribute 'codes'" using pandas.Categorical
I am trying to convert a string that is a categorical data type into a numeric. I found out that I can use pandas.Categorical,
unfortunately, accessing the codes attribute give me an error.
Here is a ...
3
votes
2
answers
103
views
How to generate scatter plot of all numeric columns against specific columns in the same dataframe
I have a dataframe with a mix of data types (object and numeric). I want to plot a scatter plot for all numeric columns in the dataset against specific columns: col_32, col_69,col_74 and col_80 ...
2
votes
3
answers
133
views
Transform DataFrame containing ID pairs into a list of sets
I have a Pandas DataFrame with the following structure
left_id
right_id
a
b
c
a
x
y
I need to transform this into a list of sets, like
[
{'a', 'b', 'c'},
{'x', 'y'}
]
the first two rows should be ...
1
vote
3
answers
199
views
Pandas DataFrame with a hundred million entries and counting the number of identical characters in strings
I have a pandas DataFrame (df) with two columns (namely Tuple and Set) and approximately 100,000,000 entries. The Tuple column data is a string of exactly 9 characters. The Set column data is an ...
4
votes
1
answer
90
views
Execution of pandas' info in python
I am new to the pandas library in python. When I loaded a file and was printing the output of df.info into the console, the data is getting printed first instead of the text that I have printed.
What ...
3
votes
1
answer
98
views
Should I drop duplicates before merging two DataFrames or after the merge?
I have two DataFrames in pandas: customers and flights. Both contain duplicates on the join key (Loyalty#). I am not sure whether the correct workflow is to remove duplicates before the merge or merge ...
0
votes
0
answers
37
views
How can I automatically export multiple Excel sheets into separate CSV files using Python? [duplicate]
I’m trying to automate a report generation task using Python.
I have an Excel workbook that contains multiple sheets (e.g., "Sales", "Orders", "Summary"),
and I want to ...
1
vote
1
answer
65
views
Output of for loop filling down in dataframe instead of returning corresponding values for each row
I'm using SpaCy to process a series of sentences and return the five most common words in each sentence. My goal is to store the output of that frequency analysis (using Counter) in a column beside ...
4
votes
3
answers
207
views
Why do I get a SettingWithCopyWarning when using shift and dropna inside a function?
In general, when I receive this warning
/home/mo/mwe.py:7: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value ...
3
votes
1
answer
93
views
How to create a column based on 2 other columns of the dataframe?
I have a kivy app and at some point of the code a pandas dataframe loaded from excel and I managed to create already 2 columns filled with booleans.
I need to create a third columns which content ...
0
votes
1
answer
109
views
Combining two dataframes and keeping the average
I'm new to coding, and I'm trying to combine the data from two weather stations into one new dataframe sorted by Datetime. I want this new dataframe to contain the average values of the two original ...
2
votes
0
answers
75
views
How do you get specific data in unknown row from a csv file using known data from the same row? [duplicate]
import geopy # used to get location
from geopy.geocoders import Nominatim
import pandas as pd
from pyproj import Transformer
def get_user_location(): # user location
geolocator = Nominatim(...
1
vote
1
answer
139
views
Pandas's to_datetime function and datetime format
I have a .csv file with two columns (Date and Time). The time zone is "Europe/Paris" with a +02:00 hours shift. The file is structured in 2 parts with two datetime formats.
Date
Time
08-11-...
4
votes
3
answers
148
views
Creating a new pandas dataframe from shape
I have information on total number of rows and number of columns for a new pandas dataframe
import pandas as pd
nRow = 10
nCol = 4
Based on this information I want to create a new dataframe where ...
2
votes
0
answers
91
views
How can I write a matrix or a pandas DataFrame to an Excel file using openpyxl, without iterating cell by cell? [duplicate]
I’d like to insert a 2D array (for example, a pandas DataFrame) into an existing Excel worksheet at a specific position (e.g., starting at cell M8), using openpyxl.
Is there a way to assign the whole ...
1
vote
1
answer
44
views
pandas.read_csv uses only utf-8 encoding for django file upload
I'm testing django using file uploads. Was facing a strange issue, when despide which encoding I choose, I'm always getting same error message that pandas is trying to decode with UTF-8
pd.read_csv(...
-5
votes
1
answer
98
views
pandas crosstab with string as second parameter
Is this code, which works, supposed to work?
import pandas as pd
from palmerpenguins import load_penguins
penguins = load_penguins()
pd.crosstab(penguins.species, "count")
species
count
...
6
votes
2
answers
200
views
How to drop duplicate values when merging dataframes
I have a DataFrame that I want to merge and drop only duplicates values based on column name and row. For example, key_x and key_y has the
same values in the same row in row 0,3,10,12,15.
My DataFrame
...
0
votes
0
answers
61
views
pywebview: error maximum recursion depth exceeded before pressing button when passing pandas/model objects in js_api
I’m embedding a small UI with pywebview and want Python to JS live updates. I created a GPSSpoofingDetector class that loads a pickled sklearn model and a pandas test CSV. I want a JavaScript “Start” ...
1
vote
1
answer
112
views
Combining Identically Indexed and Column Dataframes into 3d Dataframe
I have 3 2D DataFrames, all with identical indexes (datetime range) and column names, but different data for these labels. I would like to combine these three 2D dataframes into 1 3D DataFrame with an ...
1
vote
1
answer
96
views
Calculating MultiIndex intersection to a given tolerance in an efficient way
I have two DataFrames, data1 and data2, with 3-level multiindices. The first two levels are floats, and correspond to spatial coordinates (say longitude and latitude). The third level, time, is based ...
3
votes
1
answer
105
views
Calculating mean for a column of arrays in pandas
I have below pandas dataframe
import pandas as pd
import numpy as np
dat = pd.DataFrame({
'A': [1,2,3],
'B': [[[np.nan, 0.0, 0.0, 0.0, 0.0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], ...
0
votes
0
answers
103
views
"UserWarning: No artists with labels found to put in legend" Error when trying to create a legend with labels from dataset
I can't get legend labels to show up when I use 'CONDITION' (a longer string) as my x data and hue, however when I use CONDITION_N (a shorter string) as the hue then it appears. Why?
Warning:
/var/...
1
vote
1
answer
72
views
How can I use wildcard paths from a Pandas dataframe as required rule inputs and outputs in Snakemake?
I have a Snakemake pipeline (https://github.com/V-Varga/SPOT-BGC/tree/main), where I generate input and output file names for various intermediate steps using wildcards that refer back to file and ...
0
votes
2
answers
176
views
Error due to single-level dataframe merge with multi-level indexed dataframe
# Read lookup file which only contains 5 columns.
df_lookup = pd.read_excel(
os.path.join(path, 'lookup.xlsx'),
index_col=[0, 1, 2, 3, 4])
# sample df_lookup
# |A |B |C |D |E |
# |--|--|--|--|...
2
votes
0
answers
105
views
How to control the zorder values on superimposed bars in a histogram plot in matplotlib
I have a list of three dataframes, each of them having four columns of interest. I want to create a figure with four subplots (one for each column). In each subplot, first, I want to create a ...
5
votes
2
answers
147
views
Failing to fill empty date values with numpy nan
I have below code
import pandas as pd
import numpy as np
dat = pd.DataFrame({'A' : [1,2,3,4,5], 'B' : ['2002-01-01', '2003-01-01', '2004-01-01', '2004-01-01', '2005-01-01']})
dat['B'] = pd.to_datetime(...
8
votes
1
answer
256
views
How to write a pandas-compatible, non-elementary expression in narwhals
I'm working with the narwhals package and I'm trying to write an expression that is:
applied over groups using .over()
Non-elementary/chained (longer than a single operation)
Works when the native df ...
4
votes
4
answers
173
views
Create an incremental suffix for values in a pandas column that have duplicate values in another column
Setup
I have a dataframe, df
import pandas as pd
df = pd.DataFrame(
{
'Name':['foo','foo','foo','bar','bar','bar','baz','baz','baz'],
'Color':['red','blue','red','green','green','...
0
votes
1
answer
159
views
Index in to two specific dates on Pandas dataframe [closed]
I have a pandas dataframe where the index is datetime. I learned that I can index in to a specific date using this code:
selected_date_df = df.loc['yyyy-mm-dd']
I can also find data between two dates ...
3
votes
2
answers
207
views
How to change names of pandas MultiIndex using Styler
Let's assume we have the following:
midx = pd.MultiIndex.from_product(
[[0, 1], [0, 1], [0, 1]],
names=['L1', 'L2', 'L3'])
df = pd.DataFrame({"col": list(range(8))}, index=midx)
Now,...
3
votes
1
answer
132
views
Increase the date by number of months in pandas
I have below pandas data frame
import pandas as pd
import numpy as np
dat = pd.DataFrame({'A' : [1,2,3,4,5], 'B' : ['2002-01-01', '2003-01-01', '2004-01-01', '2004-01-01', '2005-01-01']})
dat['A'] = ...
0
votes
2
answers
105
views
Why is there a duplicate index when using sort_index() in pandas?
I am doing target mean mapping based on an external statistical table, where org_ is the external data and merged_data is the set of training data and test data. After processing, the features of ...
3
votes
2
answers
110
views
Is there a way in Python to make a row of an HTML table multi-lined?
I have a Python script that constructs a pandas DataFrame from API data, which I then convert to a pretty_html_table that will be the body of an email. In one of the rows, I have data containing an ...
1
vote
0
answers
58
views
How to efficiently denormalize a SQL DB to produce Parquet files
I'm trying to create a parquet file from a heavily normalized SQL database with a snowflake schema. Some of the dimensions have very long text attributes so that a simply running a big set of joins to ...
1
vote
2
answers
106
views
How do I calculate a relative time delta in Pandas?
I have a column of datetimes and I want to get the difference between values in terms of years, months, etc, instead of timedeltas that only provide days. How do I do this in Pandas?
Pandas provides ...
2
votes
1
answer
86
views
How to combine multiple rows of Pandas dataframe into one row using a key [duplicate]
I am trying to manipulate a CSV using Pandas and I need to get the data into the format of one row per ID.
This is an example of what I am trying to accomplish:
From:
df = pd.DataFrame({
'ID': [1, 1, ...
1
vote
1
answer
72
views
python plotly scatter ols trendline has a kink in it
I am using plotly express to model some data, and wanted to add a trendline = 'ols' to it.
when I do, I obtain a kink in the result
here is the code used:
d={'category': {63: 'test', 128: 'test', 192:...
0
votes
2
answers
86
views
Using .loc to change a value in a pd.Dataframe with a variable as column name
I need to change a value in a pd dataframe with .loc
let show with an example :
import pandas as pd
df = pd.DataFrame(data={"A":["bla","bla2"],"B":[1,2]})
I ...
-1
votes
1
answer
59
views
Why does changing a DataFrame in one Jupyter cell also change another variable? [duplicate]
I am working in Jupyter Notebook with pandas, and I noticed something strange.
In one cell , I did this:
import pandas as pd
df1 = pd.DataFrame({"A":[1,2,3]})
df2 = df1
Then in another ...
5
votes
2
answers
201
views
Why doesn't Pandas concat do a copy when one of the dataframes is empty?
Consider this example:
import pandas as pd
df_part1 = pd.DataFrame()
df_part2 = pd.DataFrame({'A': [1,1], 'B': [3,4]})
df_concat_out = pd.concat([df_part1, df_part2])
print("id(df_part2.values) ==...
1
vote
1
answer
81
views
Reconfigure a Pandas Dataframe [duplicate]
Our old ERP system generates orphaned HTML reports with the following format which I import into Pandas
Work Order Item Type Material Labor
0 552603 Budget 71119 4567
1 552603 ...
3
votes
1
answer
135
views
Why does Pandas not recognise my sqlalchemy connection engine?
I'm trying to connect to an IBM DB2 database from Python. I'm using Python 3.12.10, SQLAlchemy 1.4.54, and Pandas 2.3.2. This is what my code looks like:
import os
import sqlalchemy
import pandas as ...
3
votes
2
answers
155
views
Importing a table from a webpage as a dataframe in Python
I am trying to read in a specific table from the US Customs and Border Protection's Dashboard on Southwest Land Border Encounters as a dataframe.
The url is: https://www.cbp.gov/newsroom/stats/...
1
vote
1
answer
74
views
How to count pandas DataFrame cells using unique cell values as rows and columns [duplicate]
Let's say I have a DataFrame df like this:
pd.DataFrame({'Planet':['Planet_1','Planet_1','Planet_2','Planet_2','Planet_3','Planet_3'],'FeatureType':['Lake','Lake','Crater','Volcano','Lake','Canyon'],'...
-1
votes
1
answer
84
views
How to pivot a Pandas dataframe into the desired format? [closed]
I have the following data in a dataframe:
Product t_Proj CFType1 CFType2 CFType3
0 Product1 0 270 193 130
1 Product1 1 233 197 362
2 Product1 2 130 278 375
3 Product1 3 ...
1
vote
0
answers
123
views
Plotly - add dropdown list or buttons to OHLC / Candlestick graph
I have the following dataframe:
lst = [['10/01/2025 8:30:00', 2.74, 2.87, 2.60, 2.65, 14, 'SPXW251001P06590000', 'P', 6590],
['10/01/2025 8:31:00', 2.80, 2.80, 2.50, 2.53, 61, '...
1
vote
1
answer
124
views
How to efficiently handle lookups between resampled and original DataFrames in Pandas?
I am building a backtesting project in Python using Pandas.
I have:
A large tick / 1-minute level DataFrame (df) with full market data.
A 15-minute interval DataFrame (df_15) created from it using ...