2,817 questions
2
votes
3
answers
190
views
Polars - Replace letter in string with uppercase letter
Is there any way in polars to replace character just after the _ with uppercase using regex replace? So far I have achieved it using polars.Expr.map_elements.
Is there any alternative using native ...
2
votes
2
answers
109
views
get value from current row in rolling window
Given the following data structure
import polars as pl
df = pl.DataFrame(
{
"order_id": ["o01", "o02", "o03", "o04", "o10", &...
1
vote
0
answers
42
views
Take unique values horizontally across a Polars DataFrame to create a new string column [duplicate]
I have this dataframe:
import polars as pl
df = pl.from_repr("""shape: (4, 3)
┌──────┬──────┐
│ ccy1 ┆ ccy2 │
│ --- ┆ --- │
│ str ┆ str │
╞══════╪══════╡
│ USD ┆ USD │
│ EUR ┆ ...
0
votes
1
answer
119
views
Polars write_excel: rotate some header columns
When using pl.write_excel, I am looking for a possibility to rotate SOME header columns by 90°.
I am applying a bunch of input arguments provided by pl.write_excel in order to style the exported ...
4
votes
4
answers
93
views
How to extinguish cycle in my code when calculating EMWA?
I'm calculating EWMA values for array of streamflow, and code is like below:
import polars as pl
import numpy as np
streamflow_data = np.arange(0, 20, 1)
adaptive_alphas = np.concatenate([np.repeat(0....
5
votes
1
answer
469
views
How to get the day / month name of a column in polars
I have a polars dataframe df which has a datetime column date. I'm trying to get the name of the day and month of that column.
Consider the following example.
import polars as pl
from datetime import ...
3
votes
1
answer
385
views
Polars Schema: TypeError: dtypes must be fully-specified, got: Datetime
Hi I want to define a polars schema.
It works fine without a datetime format.
However it fails with pl.Datetime.
import polars as pl
testing_schema: pl.Schema = pl.Schema(
{
"date&...
1
vote
0
answers
33
views
Add a milliseconds since midnight integer column to a datetime in Polars? [duplicate]
I have a Polars data frame in the following format:
import polars as pl
df = pl.from_repr("""
┌───────────┬──────────┐
│ ms_of_day ┆ date │
│ --- ┆ --- │
│ i64 ┆ ...
4
votes
2
answers
506
views
Check if all values of Polars DataFrame are True
How can I check if all values of a polars DataFrame, containing only boolean columns, are True?
Example df:
df = pl.DataFrame({"a": [True, True, None],
"b": [...
1
vote
1
answer
377
views
Polars runs out of memory when collecting a JSON file
We want to use Polars to load a JSON file of 22GB (10M rows and 65 columns) but we're running out of memory when run collect() which is causing the program to crash. We're using pl.scan_ndjson to load ...
2
votes
1
answer
145
views
Python-Polars: Expression list product
In Python-Polars, it is easy to calculate the Sum of all the lists in an array with polars.Expr.list.sum. See the example below for the sum:
df = pl.DataFrame({"values": [[[1]], [[2, 3], [5,...
1
vote
0
answers
324
views
TypeError: argument 'schema': 'Object' is not a Polars data type
Why?
I am querying data from a MongoDB collection and loading the result into a Polars DataFrame. Depending on the limit filter of the mongo query the operation works or raises the error of the title. ...
1
vote
1
answer
583
views
Explicit cast of a lazy frame not possible with type mismatch?
I've only been using polars for a few months now (coming from pandas) so forgive me if I'm interpreting things wrong :)
I want to read many parquet files, merge them into a single dataframe and then ...
2
votes
1
answer
84
views
Python-Polars: Cross field calculation of struct columns
I am trying to buld a function that takes a list of struct columns, extracts two fields, and perform a cross-field combination of all the values of such fields. Everything in the same context. For ...
3
votes
1
answer
396
views
Python polars: pass named row to pl.DataFrame.map_rows
I'm looking for a way to apply a user defined function taking a dictionary, and not a tuple, of arguments as input when using pl.DataFrame.map_rows.
Trying something like
df.map_rows(lambda x: udf({k:...
1
vote
1
answer
87
views
Transpose dataframe with List elements
I have a dataframe like
┌─────┬────────┬───────┬───────┬───────┬───────┬───────┬───────┬───────┬───────┬───┬───────┬───────┬───────┬───────┬───────┬───────┬───────┬───────┬───────┬───────┐
│ rul ┆ ...
1
vote
0
answers
565
views
How to type Polars' Series in Python?
I'm trying to type my functions in Python for polars.Series objects of a specific dtype.
For instance, in a MWE, a function could look like:
import typing as tp
import polars as pl
u = pl.Series(...
0
votes
0
answers
69
views
Custom Expression returns list[f64] instead of f64 when using group_by_dynamic()
When using group_by_dynamic() to perform a rolling calculation, my custom geometric mean expression will return a list[f64] dtype for each value instead of a f64.
However, when performing the ...
0
votes
2
answers
144
views
How to speed up the operation of repeating take first n rows for each group after group_by?
The df contains 100 millions of rows, and group_by columns is like 25-30. Is there a way to speed this operation up from here? or this is the best I can get.
import polars as pl
import numpy as np
...
1
vote
1
answer
125
views
Mutate polars column and keep original column name on custom expression
I trying to implement a custom expression in Rust polars to calculate the geomean of different columns, essentailly replicating the same behavior of .mean() expression where it will apply the ...
1
vote
1
answer
132
views
DeltaTable map type
Using Spark, I can create a delta table with a map column type: MAP<STRING, TIMESTAMP>
How do I create a delta table with a map type without Spark?
I have tried multiple approaches and none of ...
2
votes
1
answer
83
views
Find nearest following row with values greater than or equal to current row
Starting with this DataFrame:
import polars as pl
df_1 = pl.DataFrame({
'name': ['Alpha', 'Alpha', 'Alpha', 'Alpha', 'Alpha'],
'index': [0, 3, 4, 7, 9],
'limit': [12, 18, 11, 5, 9],
'...
0
votes
1
answer
232
views
Polars is killing the kernel on import
I am running the following code on JupyterLab, with no other notebooks open:
!pip3 install polars --upgrade
import polars as pl
The first line upgrades me to polars 1.18.0 with no issues, but then ...
0
votes
0
answers
121
views
how do I find all polars dataframes in python
I have a long script in python, predominantly pandas, but shifting to polars.
I am reviewing memory of items.
To find 10 largest objects currently in use locals().items() and sys.getsizeof(), I run:
...
2
votes
1
answer
90
views
How to combine columns with extra strings into a concatenated string column in Polars?
I am trying to add another column that will contain combination of two columns (Total & percentage) into a result column(labels_value) which look like: (Total) percentage%.
Basically to wrap ...
3
votes
2
answers
249
views
DuplicateError with name 'null' when trying to pivot a Polars DataFrame
I have this example dataframe in polars:
import polars as pl
df_example = pl.DataFrame(
{
"DATE": ["2024-11-11", "2024-11-11", "2024-11-12", "...
0
votes
1
answer
381
views
Force schema type using Polars scan/sink csv
I have a large number of CSV files (~100,000) some of which themselves are large CSV files (i.e., >128G) and I am trying to convert them to Parquet files. The files contain a mix of character, ...
2
votes
1
answer
106
views
Create a new Polars column from a multiple choice of expressions by mapping values to a dictionary
I want to use an expression dictionary to perform calculations for a new column.
I have this Polars dataframe:
import polars as pl
df = pl.DataFrame({
"col1": ["a", "b&...
6
votes
2
answers
424
views
asof-join with multiple inequality conditions
I have two dataframes: a (~600M rows) and b (~2M rows). What is the best approach for joining b onto a, when using 1 equality condition and 2 inequality conditions on the respective columns?
a_1 = ...
1
vote
1
answer
115
views
Set column names using values from a specific row in Polars
I am bringing in the data from an Excel spreadsheet.
I want to make all the info from df.row(8) into the column header names.
In pandas it was just:
c = [ 'A', 'B', 'C', 'D', 'E', 'F' ]
df.columns = c
...
1
vote
0
answers
109
views
Save intermediate results for big polars lazyframe processing?
The issue may be related to https://github.com/pola-rs/polars/issues/9842 and How to process Python Polars LazyFrame in batches
My setup is
input = pathlib.Path("input.csv") # 300k lines
...
2
votes
1
answer
62
views
Polars transform meta data of expressions
Is it possible in python polars to transform the root_names of expression meta data?
E.g. if I have an expression like
expr = pl.col("A").dot(pl.col("B")).alias("AdotB")
...
1
vote
1
answer
663
views
Select multiple rows and use as headers with separator in Polars
Since Polars doesn't work with multi-index headers like Pandas does, I'd like to know if there's a native way to do the following:
My current implementation has to go through Pandas first and then ...
3
votes
2
answers
226
views
How do you insert a map-reduce into a Polars method chain?
I’m doing a bunch of filters and other transform applications including a group_by on a polars data frame, the objective being to count the number of html tags in a single column per date per ...
0
votes
2
answers
177
views
Polars faster alternative to successive joins
I have some big dataset and I need to do multiple successive joins that are slow. I figured an alternative was to unpivot the whole dataframe I was merging successfully, join once and then get the ...
2
votes
2
answers
714
views
How can I use Polars to stream the contents of a Parquet file as CSV text to standard output?
Using Python Polars, how can I modify the following script to stream the contents of a Parquet file as CSV text to standard output?
import polars as pl
import sys
pl.scan_parquet("BTCUSDT-trades-...
4
votes
2
answers
168
views
How should I parse times in the Japanese "30-hour" format for data analysis? [closed]
I'm considering a data analysis project involving information on Japanese TV broadcasts. The relevant data will include broadcast times, and some of those will be for programs that aired late at night....
0
votes
1
answer
64
views
Why slice expression don't get correct indexes in polars DataFrame?
I have a polars dataframe which looks like this:
shape: (2_655_541, 4)
┌────────────┬────────────┬─────────────────┬─────────────────────┐
│ streamflow ┆ sm_surface ┆ basin_id ┆ time ...
1
vote
3
answers
122
views
How to set multiple elements conditionally in Polars similar to .loc in Pandas?
I am trying to set multiple elements in a Polars DataFrame based on a condition, similar to how it is done in Pandas. Here’s an example in Pandas:
import pandas as pd
df = pd.DataFrame(dict(
A=[1,...
2
votes
2
answers
112
views
Compute percentage of positive rows in a group_by polars DataFrame
I need to compute the percentage of positive values in the value column grouped by the group column.
import polars as pl
df = pl.DataFrame(
{
"group": ["A", "A&...
0
votes
0
answers
228
views
python polars in jupyter lab leads to error due to infer_schema_legth
I often run into data fetching errors when I'm working in JupyterLab and trying to use polars instead of pandas as the dataframe library.
I do this by running the statement
%config SqlMagic.autopolars ...
3
votes
4
answers
913
views
Setting slice of column to list of values on polars dataframe
In the code below I'm creating a polars- and a pandas dataframe with identical data. I want to select a set of rows based on a condition on column A, then update the corresponding rows for column C. I'...
2
votes
0
answers
78
views
Casting string column to pl.Datetime does not keep timezone information while str.to_datetime does
Polars version 1.17.11
I have a json object with the following structure:
json_obj = [
{"timestamp": "2024-10-01T21:23:23Z", "value": 31},
{"timestamp": ...
6
votes
1
answer
614
views
How to conditionally format data in Great Tables? [duplicate]
I am trying to conditionally format table data using Great Tables but not sure how to do it.
To highlight the color of all those cells (sort of heatmap) whose values is higher than Upper Range column.
...
1
vote
2
answers
183
views
How to forward / backward fill null fields in a struct column using Polars?
This code not fill null values in column. I want to some fields to forward and backward fill nulls.
import polars as pl
df1 = pl.LazyFrame({
"dt": [
"...
0
votes
1
answer
103
views
How to convert Polars dataframe to numpy array which has certain dims?
I have a Polars DataFrame with 300 basins, each basin having 100,000 time records, and each time record consisting of 40 variables, totaling 30 million rows and 40 variables. How can I reconstruct it ...
3
votes
1
answer
37
views
How to apply `numpy.finfo` to Polars types?
I sometimes apply numpy.finfo to a Pandas or a NumPy dtype – to determine the maximum support value (max) or the minimum meaningful increment (eps), say. Is there an equivalent for Polars dtypes? Or ...
1
vote
1
answer
48
views
Joining two dataframes that share "index columns" (id columns), but not data columns, so that the resulting dataframe has a full spine of ids?
I find myself doing this:
import polars as pl
import sys
red_data = pl.DataFrame(
[
pl.Series("id", [0, 1, 2], dtype=pl.UInt8()),
pl.Series("red_data", [1, 0, ...
-1
votes
1
answer
129
views
Why there is 'Unpickling Error' when using polars to read data for pytorch?
I have changed my data tool from xarray to polars in recent, and use pl.DataFrame.to_torch() to generate tensor for training my Pytorch model. Data source's format is parquet file.
For avoiding fork ...
0
votes
0
answers
64
views
Python-Polars: Performance of wide dataframe
We are currently implementing a calculation engine using Polars as backend. Given the characteristics of our data model, we chose to rely on a wide dataframe, where the variables contain the time ...