148,689 questions
1
vote
3
answers
63
views
How to modify mulitple columns applying if else to multiple pandas dataframe columns
I have a dataFrame with columns Age, Salary and others, if I used:
df['Age'] = df['Age'].apply(lambda x : x+100 if x>30 else 0)
Then I can modify the Age column with the if else condition. Also, if ...
0
votes
1
answer
82
views
How to create 2 new columns in R (date difference + convert Seasons to minutes)? [duplicate]
I am new to R and trying to create two new variables from my dataset.
My data frame is called netflix and it contains these relevant columns:
date_added and duration
Example values:
date_added: "...
1
vote
0
answers
41
views
How to show the streaming parts of a polars query using explain()?
I am trying to explain() a Polars query to see which operations can be executed using the streaming engine. Currently, I am only able to do this using show_graph().
From sources on the web, I see that ...
Advice
1
vote
2
replies
77
views
Show the timezone when printing a data frame with a `POSIXct` column
Is there a simple way (i.e., not involoving writing a print() method) to show the timezone when printing a data frame with a POSIXct column ?
as.data.frame(Sys.time())
# Sys.time()
# 1 2025-...
0
votes
1
answer
83
views
Why does groupby().apply() produce inconsistent results on identical groups when the DataFrame has overlapping indices?
I noticed that groupby().apply() produces different results for two groups that look identical, except that the overall DataFrame has duplicate index values.
Here is a minimal reproducible example:
...
1
vote
1
answer
59
views
Polars parse multiple datetime format [duplicate]
I have string column in polars dataframe with multiple datetime formats and I am using following code to convert datatype of column from string into datetime.
import polars as pl
df = pl.from_dict({'...
0
votes
0
answers
70
views
polars.LazyFrame.sink_csv does not give CRLF line termination [duplicate]
I have a Python file
import polars as pl
import requests
from pathlib import Path
url = "https://raw.githubusercontent.com/leanhdung1994/files/main/processedStep1_enwiktionary_namespace_0_43....
0
votes
1
answer
151
views
Construct a simple loop to create new data frames in R [closed]
I have a number of data.frames, with names apple, banana, and coffee. I want to create, and then export, new dataframes in a for-loop corresponding to each one, call them apple_new, banana_new, and ...
2
votes
4
answers
149
views
How to split dataframe into multiple sub-dataframes based on column value
I got a dataframe df1 which looks like this:
Column1
Column2
13
1
12
1
15
0
16
0
15
1
14
1
12
1
11
0
21
1
45
1
44
0
The 1s indicate that a measurement started, I don't know how many 1s will be in one ...
1
vote
3
answers
162
views
Polars: how to write a column of strings into a txt file without escaping?
I have a .ndjson files with millions of rows. Each row has a field html which contains html strings. I would like to write all such html into a .txt file. One html is into one line of the .txt file. I ...
5
votes
2
answers
106
views
How to resample timeseries with origin aligned to start of year
Consider the following pandas Series with a DatatimeIndex of daily values (using day-of-year as an example):
import pandas as pd
dti = pd.date_range("2017-11-02", "2019-05-21", ...
2
votes
1
answer
131
views
Why does a nearest join_asof() return exact matches despite allow_exact_matches=False?
I am looking for the nearest non exact match on the dates column:
import polars as pl
df = pl.from_repr("""
┌─────┬────────────┐
│ uid ┆ dates │
│ --- ┆ --- │
│ i64 ┆ date ...
Tooling
0
votes
2
replies
67
views
How to export or import TOON in pandas?
I would like to know how to export or import TOON (Token object oriented notation) in pandas.
-2
votes
1
answer
80
views
polars.exceptions.DuplicateError: column with name 'name_ID' has more than one occurrence [closed]
I have a dictionary of polars.DataFrames called data_dict.
All dataframes inside the dict values are having an extra index column ''.
I want to drop that column and set a new column named 'name_ID'
...
3
votes
2
answers
211
views
Efficiently get first indices of consecutive identical digits in big pandas DataFrames
I have a DataFrame with a column Digit of digits at base 10. For example
import numpy as np
import pandas as pd
df = pd.DataFrame({
"Digit": [
1, 3, 5, 7, 0, 0, 0,
4, 8, ...
2
votes
1
answer
73
views
problem on the x-axis of the graph, doesn't render the time
I am working on a dashboard using Shiny for Python and Plotly Express. I am trying to create a Gantt chart (using px.timeline) to visualize the operating periods of different boilers (ON/OFF states).
...
2
votes
1
answer
76
views
Change color of single line in altair line chart based on other indicator column
Imagine having the following polars dataframe "df" that contains the temperature of a machine that is either "active" or "inactive":
import polars as pl
from datetime ...
1
vote
0
answers
76
views
Is it possible to drop/select columns where col.n_unique > 1 with native polars syntax [duplicate]
I have a table that looks like this
import polars as pl
df = pl.DataFrame(
{
"col1": [1, 2, 3, 4, 5],
"col2": [10, 20, 30, 40, 50],
"col3": [...
-4
votes
0
answers
75
views
How to combine two pandas DataFrames [duplicate]
I am trying add one pandas DataFrame to another DataFrame. How can I do this in the style of list.append?
usernames = {"anvar":"anvar123", "behruz":"Bex124", &...
1
vote
2
answers
139
views
How to get a true/false without duplicates when comparing two Pandas dataframes?
I have one dataframe with sessions - one session, one row, so SID is unique. The session has a doctor name.
SID
Doctor
Patient
1
robby
david
2
langdon
sara
3
langdon
michael
I have another dataframe ...
12
votes
0
answers
326
views
Not displaying DataFrame's name in Data Wrangler extension of VSCode, displaying "Data grid"
It is a while that I am using Data Wrangler extension in VS Code; it is very useful for analyzing datasets and filtering some columns to see the features. When I opened a dataframe in it, it used to ...
0
votes
0
answers
39
views
Pandas merge on one of two criteria [duplicate]
I have a table/df that holds a set of code and value pairs. The codes are a mix of old (legacy) and new codes due to process changes. I have a second table/df that holds the old codes, new codes, ...
1
vote
1
answer
109
views
Polars print changed values between 2 dataframes
Given two polars dataframes of the same shape, I would like to print the number of values different between the two, including missing values that are not missing in the other dataframe.
I came up ...
2
votes
2
answers
91
views
Seeking more efficient method in Python & Polars to perform monthly comparison within each year
I have a CSV of energy consumption data over time (each month for several years).
I want to determine the percentage (decimal portion) for each month across that year; e.g., August was 12.3% of the ...
-3
votes
1
answer
97
views
create dataframe from csv in PythonAnywhere [closed]
I am trying to display the headers of a data frame I created based on a csv file using the PythonAnywhere free version. I keep getting a huge error message and I don't understand what I did wrong.
...
Best practices
0
votes
3
replies
99
views
How to access specific, indexed elements of a Pandas Dataframe for math?
What is the right/pythonic way to do math on a few indexed elements in a Pandas Dataframe?
I tried a few ways but they seem awkward and confusing:
df = pd.DataFrame({'x': [1, 2, 3, 4, 5, 6, 7, 9, ]})
...
1
vote
2
answers
107
views
Excel adding unicode symbol from r csv output
Excel is adding a unicode character to a summary file I save from r as a .csv, where it adds "¬" in front of "±". Is there a way to edit the r script to prevent this?
cola <- c(&...
1
vote
3
answers
100
views
Show matched rows in polars join
When you join two tables, STATA prints the number of rows merged and unmerged.
For instance, take Example 1 at page 13 of the STATA merge doc:
use https://www.stata-press.com/data/r19/autosize
merge 1:...
1
vote
1
answer
65
views
How to assign int input value as column name in pandas [duplicate]
I need to take an integer input value, assign it to a variable, and then use that variable as a column name to get data from a pandas DataFrame.
Data:
1 10 20 30
2 40 50 60
Steps:
Assign input to ...
1
vote
0
answers
64
views
Why does DataFrame.apply() use axis=1 for rows instead of axis=0 in pandas? [duplicate]
I was reading the pandas documentation:
pandas.DataFrame.apply documentation
In this definition, it seems to me like the use of axis is the exact opposite of what it means in other functions?
To apply ...
0
votes
0
answers
118
views
How to get value of cell in dataframe
I'm new to Python, so please be lenient.
I want to read the value of a single cell from a dataframe based on the selected row. I do this as below, but I get the value not when I click on a record (...
1
vote
2
answers
153
views
After encoding my categorical columns in a pandas dataframe, I was left with too many columns. How can I drop some?
I am using Python with a pandas dataframe, it is a CSV of Steam games, and I have the categorical columns of publishers, developers, categories, genres, and tags, but categories, genres, and tags are ...
0
votes
2
answers
126
views
Replace values in multiple columns in a data frame based on conditions in another data frame? [duplicate]
I'm working with a data frame where the color of an object (red or green) was recorded in ordinal classes corresponding to percent coverage.
I am looking to replace all the classes with their ...
3
votes
0
answers
146
views
Why polars join function performance deteriorates so much from version 1.30.0 to 1.31.0?
I noticed a significant performance deterioration when using polars dataframe join function after upgrading polars from 1.30.0 to 1.31.0. The code snippet is below:
import polars as pl
import time
...
3
votes
2
answers
134
views
Calculate cumulative value based on another column [duplicate]
Having this kind of pandas dataframe
df = pd.DataFrame({
'ts_diff':[0, 0, 738, 20, 29, 61, 42, 18, 62, 41, 42, 0, 0, 729, 43, 59, 42, 61, 44, 36, 61, 61, 42, 18, 62, 41, 42, 0, 0]
})
ts_diff - is ...
0
votes
0
answers
70
views
Reading in values from CSV and making sure they are non-scientific format in R? [duplicate]
Assume I have the following Excel Sheet:
Location
Mar2000
London
1234567891011
Tokyo
12345667897
These are the raw values saved in a CSV format e.g. my_data.csv (assume it is CSV not UTF-8 format). ...
3
votes
3
answers
193
views
Filter a pandas df: per group, keep only non-null rows if we have them, else keep a single null row
Hopefully the title is reasonably intuitive, edits welcome. Say I have this dataframe:
df = pd.DataFrame({'x': ['A', 'B', 'B', 'C', 'C', 'C', 'D', 'D'],
'y': [None, None, 1, 2, 3, 4,...
0
votes
0
answers
41
views
Extracting numeric component of some of the column entries [duplicate]
I am reading in some .xpt data using the haven package. When I view the dataframe in RStudio, it appears as in the snapshot below (showing only a small number of the columns):
There are actually 16 ...
1
vote
3
answers
159
views
Replace value by condition across entire polars df
I'd like to replace any value greater than some condition with zero for any column except the date column in a df.
The closest I've found it
df.with_columns(
pl.when(pl.any_horizontal(pl.col(pl....
2
votes
1
answer
129
views
Find differing rows between two Polars DataFrames based on ID and multiple columns
I have two Polars DataFrames (df1 and df2) with the same columns.
I want to compare them by ID and Iname, and get the rows where any of the other columns (X, Y, Z) differ between the two.
import ...
2
votes
1
answer
121
views
Concatenate Tables Based on Column Information in Python [duplicate]
I have a dataframes pulled from a file. The variable with all these dataframe names is: Data_Tables.
These dataframes all have the same columns, and I want to concatenate the dataframes based on the ...
0
votes
0
answers
163
views
How to efficiently get the last row of a rolling aggregation group without .last()?
I'm working with a large Polars LazyFrame and computing rolling aggregations grouped by customer (Cusid). I need to find the "front" of the rolling window (last Tts_date) for each group to ...
2
votes
4
answers
128
views
How to find a common value using if statement
I am still a beginner in python. I am trying to find a common value with if statement,
import pandas as pd
df = pd.read_csv("data.csv")
for n in range(2, len(df)):
if df.loc[n].isin([2]...
0
votes
1
answer
105
views
Slicing pandas dataframe with a value from a Jupyter widget raises an error
In a Jupyter Notebook, I make use of a Jupyter Widget to interact with a function.
The widget gives me a dropdown that can cycle through some plots, and its options are retrieved from a dataframe.
...
1
vote
0
answers
78
views
Why does the pivoted dataframe contain information about columns that weren't included in the pivot? [duplicate]
There is a dataframe with a multiindex columns:
import pandas as pd
df = pd.DataFrame({
"A": ["foo", "foo", "bar", "bar"],
"B": [&...
5
votes
3
answers
168
views
Slice a pandas dataframe at specific index points
I have below pandas dataframe
import pandas as pd
data = pd.DataFrame({'x1':range(10, 18), # Create pandas DataFrame
'x2':['a', 'b', 'b', 'c', 'd', 'a', 'b', 'd'],
...
6
votes
1
answer
106
views
Polars streaming: How to compute a nested window aggregation while avoiding in-memory-maps?
I want to calculate the mean over some group column 'a' but include only one value per second group column 'b'.
Constraints:
I want to preserve all original records in the result.
(if possible) avoid ...
4
votes
3
answers
106
views
Extending polars DataFrame while maintaining variables between calls
I would like to code a logger for polars using the Custom Namespace API.
For instance, starting from:
import logging
import polars as pl
penguins_pl = pl.read_csv("https://raw.githubusercontent....
0
votes
1
answer
125
views
How to add a new column in a dataframe matching the matrix or multiple matrices by date variables
I want to add a seasonal_factor column to my D1 data frame. The seasonal factor is from another source, in matrix format, with 4 matrices per year from 2021 to 2024.
I get errors on matching the same ...
4
votes
4
answers
178
views
Reference column named "*" in Polars
I have a Polars DataFrame with a column named "*" and would like to reference just that column. When I try to use pl.col("*") it is interpreted as a wildcard for "all columns.&...