Skip to main content
Filter by
Sorted by
Tagged with
4 votes
0 answers
141 views

I’m using xlwings to read an Excel sheet into a pandas DataFrame with the built-in pd.DataFrame converter. Some of my columns contain mixed data (e.g. IDs or codes like 123, 00123, ABCD). When I read ...
Dynamicgra d's user avatar
5 votes
1 answer
85 views

I'm building a dashboard in Shiny for Python and I'm stuck on a strange bug. I have a Plotly px.timeline that should display boiler on-times based on a date range from a Flatpickr input. The Problem: ...
Juan Siécola's user avatar
2 votes
1 answer
102 views

I am trying to convert a string that is a categorical data type into a numeric. I found out that I can use pandas.Categorical, unfortunately, accessing the codes attribute give me an error. Here is a ...
JA-pythonista's user avatar
3 votes
2 answers
103 views

I have a dataframe with a mix of data types (object and numeric). I want to plot a scatter plot for all numeric columns in the dataset against specific columns: col_32, col_69,col_74 and col_80 ...
RayX500's user avatar
  • 319
2 votes
3 answers
133 views

I have a Pandas DataFrame with the following structure left_id right_id a b c a x y I need to transform this into a list of sets, like [ {'a', 'b', 'c'}, {'x', 'y'} ] the first two rows should be ...
Joe F.'s user avatar
  • 917
1 vote
3 answers
199 views

I have a pandas DataFrame (df) with two columns (namely Tuple and Set) and approximately 100,000,000 entries. The Tuple column data is a string of exactly 9 characters. The Set column data is an ...
Max Pierini's user avatar
  • 2,323
4 votes
1 answer
90 views

I am new to the pandas library in python. When I loaded a file and was printing the output of df.info into the console, the data is getting printed first instead of the text that I have printed. What ...
UnemployedBrat's user avatar
3 votes
1 answer
98 views

I have two DataFrames in pandas: customers and flights. Both contain duplicates on the join key (Loyalty#). I am not sure whether the correct workflow is to remove duplicates before the merge or merge ...
Teexlol's user avatar
  • 31
0 votes
0 answers
37 views

I’m trying to automate a report generation task using Python. I have an Excel workbook that contains multiple sheets (e.g., "Sales", "Orders", "Summary"), and I want to ...
whdaks1019's user avatar
1 vote
1 answer
65 views

I'm using SpaCy to process a series of sentences and return the five most common words in each sentence. My goal is to store the output of that frequency analysis (using Counter) in a column beside ...
cmr's user avatar
  • 23
4 votes
3 answers
207 views

In general, when I receive this warning /home/mo/mwe.py:7: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value ...
Mo_'s user avatar
  • 2,080
3 votes
1 answer
93 views

I have a kivy app and at some point of the code a pandas dataframe loaded from excel and I managed to create already 2 columns filled with booleans. I need to create a third columns which content ...
user31746640's user avatar
0 votes
1 answer
109 views

I'm new to coding, and I'm trying to combine the data from two weather stations into one new dataframe sorted by Datetime. I want this new dataframe to contain the average values of the two original ...
Maurice Verest's user avatar
2 votes
0 answers
75 views

import geopy # used to get location from geopy.geocoders import Nominatim import pandas as pd from pyproj import Transformer def get_user_location(): # user location geolocator = Nominatim(...
Litcoder's user avatar
1 vote
1 answer
139 views

I have a .csv file with two columns (Date and Time). The time zone is "Europe/Paris" with a +02:00 hours shift. The file is structured in 2 parts with two datetime formats. Date Time 08-11-...
RémyClaverie's user avatar
4 votes
3 answers
148 views

I have information on total number of rows and number of columns for a new pandas dataframe import pandas as pd nRow = 10 nCol = 4 Based on this information I want to create a new dataframe where ...
Brian Smith's user avatar
  • 1,679
2 votes
0 answers
91 views

I’d like to insert a 2D array (for example, a pandas DataFrame) into an existing Excel worksheet at a specific position (e.g., starting at cell M8), using openpyxl. Is there a way to assign the whole ...
Amadou's user avatar
  • 21
1 vote
1 answer
44 views

I'm testing django using file uploads. Was facing a strange issue, when despide which encoding I choose, I'm always getting same error message that pandas is trying to decode with UTF-8 pd.read_csv(...
Aidas's user avatar
  • 170
-5 votes
1 answer
98 views

Is this code, which works, supposed to work? import pandas as pd from palmerpenguins import load_penguins penguins = load_penguins() pd.crosstab(penguins.species, "count") species count ...
robertspierre's user avatar
6 votes
2 answers
200 views

I have a DataFrame that I want to merge and drop only duplicates values based on column name and row. For example, key_x and key_y has the same values in the same row in row 0,3,10,12,15. My DataFrame ...
Chris's user avatar
  • 63
0 votes
0 answers
61 views

I’m embedding a small UI with pywebview and want Python to JS live updates. I created a GPSSpoofingDetector class that loads a pickled sklearn model and a pandas test CSV. I want a JavaScript “Start” ...
Ahsan914's user avatar
1 vote
1 answer
112 views

I have 3 2D DataFrames, all with identical indexes (datetime range) and column names, but different data for these labels. I would like to combine these three 2D dataframes into 1 3D DataFrame with an ...
cma0014's user avatar
  • 1,599
1 vote
1 answer
96 views

I have two DataFrames, data1 and data2, with 3-level multiindices. The first two levels are floats, and correspond to spatial coordinates (say longitude and latitude). The third level, time, is based ...
peich's user avatar
  • 33
3 votes
1 answer
105 views

I have below pandas dataframe import pandas as pd import numpy as np dat = pd.DataFrame({ 'A': [1,2,3], 'B': [[[np.nan, 0.0, 0.0, 0.0, 0.0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], ...
Bogaso's user avatar
  • 3,896
0 votes
0 answers
103 views

I can't get legend labels to show up when I use 'CONDITION' (a longer string) as my x data and hue, however when I use CONDITION_N (a shorter string) as the hue then it appears. Why? Warning: /var/...
Sarah Warner's user avatar
1 vote
1 answer
72 views

I have a Snakemake pipeline (https://github.com/V-Varga/SPOT-BGC/tree/main), where I generate input and output file names for various intermediate steps using wildcards that refer back to file and ...
Vi_Varga's user avatar
0 votes
2 answers
176 views

# Read lookup file which only contains 5 columns. df_lookup = pd.read_excel( os.path.join(path, 'lookup.xlsx'), index_col=[0, 1, 2, 3, 4]) # sample df_lookup # |A |B |C |D |E | # |--|--|--|--|...
mk_'s user avatar
  • 27
2 votes
0 answers
105 views

I have a list of three dataframes, each of them having four columns of interest. I want to create a figure with four subplots (one for each column). In each subplot, first, I want to create a ...
Arindam Das's user avatar
5 votes
2 answers
147 views

I have below code import pandas as pd import numpy as np dat = pd.DataFrame({'A' : [1,2,3,4,5], 'B' : ['2002-01-01', '2003-01-01', '2004-01-01', '2004-01-01', '2005-01-01']}) dat['B'] = pd.to_datetime(...
Brian Smith's user avatar
  • 1,679
8 votes
1 answer
256 views

I'm working with the narwhals package and I'm trying to write an expression that is: applied over groups using .over() Non-elementary/chained (longer than a single operation) Works when the native df ...
Slash's user avatar
  • 581
4 votes
4 answers
173 views

Setup I have a dataframe, df import pandas as pd df = pd.DataFrame( { 'Name':['foo','foo','foo','bar','bar','bar','baz','baz','baz'], 'Color':['red','blue','red','green','green','...
bismo's user avatar
  • 1,645
0 votes
1 answer
159 views

I have a pandas dataframe where the index is datetime. I learned that I can index in to a specific date using this code: selected_date_df = df.loc['yyyy-mm-dd'] I can also find data between two dates ...
GC123's user avatar
  • 411
3 votes
2 answers
207 views

Let's assume we have the following: midx = pd.MultiIndex.from_product( [[0, 1], [0, 1], [0, 1]], names=['L1', 'L2', 'L3']) df = pd.DataFrame({"col": list(range(8))}, index=midx) Now,...
MarcoS's user avatar
  • 13.6k
3 votes
1 answer
132 views

I have below pandas data frame import pandas as pd import numpy as np dat = pd.DataFrame({'A' : [1,2,3,4,5], 'B' : ['2002-01-01', '2003-01-01', '2004-01-01', '2004-01-01', '2005-01-01']}) dat['A'] = ...
Brian Smith's user avatar
  • 1,679
0 votes
2 answers
105 views

I am doing target mean mapping based on an external statistical table, where org_ is the external data and merged_data is the set of training data and test data. After processing, the features of ...
osquer kkzlk's user avatar
3 votes
2 answers
110 views

I have a Python script that constructs a pandas DataFrame from API data, which I then convert to a pretty_html_table that will be the body of an email. In one of the rows, I have data containing an ...
MasterCal's user avatar
1 vote
0 answers
58 views

I'm trying to create a parquet file from a heavily normalized SQL database with a snowflake schema. Some of the dimensions have very long text attributes so that a simply running a big set of joins to ...
Davor Cubranic's user avatar
1 vote
2 answers
106 views

I have a column of datetimes and I want to get the difference between values in terms of years, months, etc, instead of timedeltas that only provide days. How do I do this in Pandas? Pandas provides ...
wjandrea's user avatar
  • 33.9k
2 votes
1 answer
86 views

I am trying to manipulate a CSV using Pandas and I need to get the data into the format of one row per ID. This is an example of what I am trying to accomplish: From: df = pd.DataFrame({ 'ID': [1, 1, ...
sar's user avatar
  • 21
1 vote
1 answer
72 views

I am using plotly express to model some data, and wanted to add a trendline = 'ols' to it. when I do, I obtain a kink in the result here is the code used: d={'category': {63: 'test', 128: 'test', 192:...
frank's user avatar
  • 3,816
0 votes
2 answers
86 views

I need to change a value in a pd dataframe with .loc let show with an example : import pandas as pd df = pd.DataFrame(data={"A":["bla","bla2"],"B":[1,2]}) I ...
seb66's user avatar
  • 25
-1 votes
1 answer
59 views

I am working in Jupyter Notebook with pandas, and I noticed something strange. In one cell , I did this: import pandas as pd df1 = pd.DataFrame({"A":[1,2,3]}) df2 = df1 Then in another ...
Gouri Phadnis's user avatar
5 votes
2 answers
201 views

Consider this example: import pandas as pd df_part1 = pd.DataFrame() df_part2 = pd.DataFrame({'A': [1,1], 'B': [3,4]}) df_concat_out = pd.concat([df_part1, df_part2]) print("id(df_part2.values) ==...
Ben Farmer's user avatar
  • 2,974
1 vote
1 answer
81 views

Our old ERP system generates orphaned HTML reports with the following format which I import into Pandas Work Order Item Type Material Labor 0 552603 Budget 71119 4567 1 552603 ...
Woody 1470's user avatar
3 votes
1 answer
135 views

I'm trying to connect to an IBM DB2 database from Python. I'm using Python 3.12.10, SQLAlchemy 1.4.54, and Pandas 2.3.2. This is what my code looks like: import os import sqlalchemy import pandas as ...
SRJCoding's user avatar
  • 521
3 votes
2 answers
155 views

I am trying to read in a specific table from the US Customs and Border Protection's Dashboard on Southwest Land Border Encounters as a dataframe. The url is: https://www.cbp.gov/newsroom/stats/...
Ari's user avatar
  • 2,023
1 vote
1 answer
74 views

Let's say I have a DataFrame df like this: pd.DataFrame({'Planet':['Planet_1','Planet_1','Planet_2','Planet_2','Planet_3','Planet_3'],'FeatureType':['Lake','Lake','Crater','Volcano','Lake','Canyon'],'...
Zane Bradley's user avatar
-1 votes
1 answer
84 views

I have the following data in a dataframe: Product t_Proj CFType1 CFType2 CFType3 0 Product1 0 270 193 130 1 Product1 1 233 197 362 2 Product1 2 130 278 375 3 Product1 3 ...
DC00107's user avatar
1 vote
0 answers
123 views

I have the following dataframe: lst = [['10/01/2025 8:30:00', 2.74, 2.87, 2.60, 2.65, 14, 'SPXW251001P06590000', 'P', 6590], ['10/01/2025 8:31:00', 2.80, 2.80, 2.50, 2.53, 61, '...
Dan's user avatar
  • 111
1 vote
1 answer
124 views

I am building a backtesting project in Python using Pandas. I have: A large tick / 1-minute level DataFrame (df) with full market data. A 15-minute interval DataFrame (df_15) created from it using ...
its m's user avatar
  • 49