Skip to main content
Filter by
Sorted by
Tagged with
2 votes
3 answers
190 views

Is there any way in polars to replace character just after the _ with uppercase using regex replace? So far I have achieved it using polars.Expr.map_elements. Is there any alternative using native ...
dikesh's user avatar
  • 3,135
2 votes
2 answers
109 views

Given the following data structure import polars as pl df = pl.DataFrame( { "order_id": ["o01", "o02", "o03", "o04", "o10", &...
dpprdan's user avatar
  • 1,817
1 vote
0 answers
42 views

I have this dataframe: import polars as pl df = pl.from_repr("""shape: (4, 3) ┌──────┬──────┐ │ ccy1 ┆ ccy2 │ │ --- ┆ --- │ │ str ┆ str │ ╞══════╪══════╡ │ USD ┆ USD │ │ EUR ┆ ...
Phil-ZXX's user avatar
  • 3,601
0 votes
1 answer
119 views

When using pl.write_excel, I am looking for a possibility to rotate SOME header columns by 90°. I am applying a bunch of input arguments provided by pl.write_excel in order to style the exported ...
Andi's user avatar
  • 5,177
4 votes
4 answers
93 views

I'm calculating EWMA values for array of streamflow, and code is like below: import polars as pl import numpy as np streamflow_data = np.arange(0, 20, 1) adaptive_alphas = np.concatenate([np.repeat(0....
forestbat's user avatar
  • 1,115
5 votes
1 answer
469 views

I have a polars dataframe df which has a datetime column date. I'm trying to get the name of the day and month of that column. Consider the following example. import polars as pl from datetime import ...
Simon's user avatar
  • 1,209
3 votes
1 answer
385 views

Hi I want to define a polars schema. It works fine without a datetime format. However it fails with pl.Datetime. import polars as pl testing_schema: pl.Schema = pl.Schema( { "date&...
SysRIP's user avatar
  • 491
1 vote
0 answers
33 views

I have a Polars data frame in the following format: import polars as pl df = pl.from_repr(""" ┌───────────┬──────────┐ │ ms_of_day ┆ date │ │ --- ┆ --- │ │ i64 ┆ ...
nybhh's user avatar
  • 101
4 votes
2 answers
506 views

How can I check if all values of a polars DataFrame, containing only boolean columns, are True? Example df: df = pl.DataFrame({"a": [True, True, None], "b": [...
mouwsy's user avatar
  • 2,127
1 vote
1 answer
377 views

We want to use Polars to load a JSON file of 22GB (10M rows and 65 columns) but we're running out of memory when run collect() which is causing the program to crash. We're using pl.scan_ndjson to load ...
n4gash's user avatar
  • 23
2 votes
1 answer
145 views

In Python-Polars, it is easy to calculate the Sum of all the lists in an array with polars.Expr.list.sum. See the example below for the sum: df = pl.DataFrame({"values": [[[1]], [[2, 3], [5,...
yz_jc's user avatar
  • 271
1 vote
0 answers
324 views

Why? I am querying data from a MongoDB collection and loading the result into a Polars DataFrame. Depending on the limit filter of the mongo query the operation works or raises the error of the title. ...
Santiago Noacco's user avatar
1 vote
1 answer
583 views

I've only been using polars for a few months now (coming from pandas) so forgive me if I'm interpreting things wrong :) I want to read many parquet files, merge them into a single dataframe and then ...
Droid's user avatar
  • 673
2 votes
1 answer
84 views

I am trying to buld a function that takes a list of struct columns, extracts two fields, and perform a cross-field combination of all the values of such fields. Everything in the same context. For ...
yz_jc's user avatar
  • 271
3 votes
1 answer
396 views

I'm looking for a way to apply a user defined function taking a dictionary, and not a tuple, of arguments as input when using pl.DataFrame.map_rows. Trying something like df.map_rows(lambda x: udf({k:...
paulduf's user avatar
  • 320
1 vote
1 answer
87 views

I have a dataframe like ┌─────┬────────┬───────┬───────┬───────┬───────┬───────┬───────┬───────┬───────┬───┬───────┬───────┬───────┬───────┬───────┬───────┬───────┬───────┬───────┬───────┐ │ rul ┆ ...
Glenn Pierce's user avatar
1 vote
0 answers
565 views

I'm trying to type my functions in Python for polars.Series objects of a specific dtype. For instance, in a MWE, a function could look like: import typing as tp import polars as pl u = pl.Series(...
globglogabgalab's user avatar
0 votes
0 answers
69 views

When using group_by_dynamic() to perform a rolling calculation, my custom geometric mean expression will return a list[f64] dtype for each value instead of a f64. However, when performing the ...
Trevor Seibert's user avatar
0 votes
2 answers
144 views

The df contains 100 millions of rows, and group_by columns is like 25-30. Is there a way to speed this operation up from here? or this is the best I can get. import polars as pl import numpy as np ...
user28199045's user avatar
1 vote
1 answer
125 views

I trying to implement a custom expression in Rust polars to calculate the geomean of different columns, essentailly replicating the same behavior of .mean() expression where it will apply the ...
Trevor Seibert's user avatar
1 vote
1 answer
132 views

Using Spark, I can create a delta table with a map column type: MAP<STRING, TIMESTAMP> How do I create a delta table with a map type without Spark? I have tried multiple approaches and none of ...
Frank's user avatar
  • 634
2 votes
1 answer
83 views

Starting with this DataFrame: import polars as pl df_1 = pl.DataFrame({ 'name': ['Alpha', 'Alpha', 'Alpha', 'Alpha', 'Alpha'], 'index': [0, 3, 4, 7, 9], 'limit': [12, 18, 11, 5, 9], '...
Danilo Setton's user avatar
0 votes
1 answer
232 views

I am running the following code on JupyterLab, with no other notebooks open: !pip3 install polars --upgrade import polars as pl The first line upgrades me to polars 1.18.0 with no issues, but then ...
mmyoung77's user avatar
  • 1,447
0 votes
0 answers
121 views

I have a long script in python, predominantly pandas, but shifting to polars. I am reviewing memory of items. To find 10 largest objects currently in use locals().items() and sys.getsizeof(), I run: ...
frank's user avatar
  • 3,816
2 votes
1 answer
90 views

I am trying to add another column that will contain combination of two columns (Total & percentage) into a result column(labels_value) which look like: (Total) percentage%. Basically to wrap ...
ViSa's user avatar
  • 2,357
3 votes
2 answers
249 views

I have this example dataframe in polars: import polars as pl df_example = pl.DataFrame( { "DATE": ["2024-11-11", "2024-11-11", "2024-11-12", "...
Olivier_s_j's user avatar
  • 5,232
0 votes
1 answer
381 views

I have a large number of CSV files (~100,000) some of which themselves are large CSV files (i.e., >128G) and I am trying to convert them to Parquet files. The files contain a mix of character, ...
user1805103's user avatar
2 votes
1 answer
106 views

I want to use an expression dictionary to perform calculations for a new column. I have this Polars dataframe: import polars as pl df = pl.DataFrame({ "col1": ["a", "b&...
Babak Fi Foo's user avatar
  • 1,078
6 votes
2 answers
424 views

I have two dataframes: a (~600M rows) and b (~2M rows). What is the best approach for joining b onto a, when using 1 equality condition and 2 inequality conditions on the respective columns? a_1 = ...
usdn's user avatar
  • 402
1 vote
1 answer
115 views

I am bringing in the data from an Excel spreadsheet. I want to make all the info from df.row(8) into the column header names. In pandas it was just: c = [ 'A', 'B', 'C', 'D', 'E', 'F' ] df.columns = c ...
diogenes's user avatar
  • 2,181
1 vote
0 answers
109 views

The issue may be related to https://github.com/pola-rs/polars/issues/9842 and How to process Python Polars LazyFrame in batches My setup is input = pathlib.Path("input.csv") # 300k lines ...
JRX's user avatar
  • 21
2 votes
1 answer
62 views

Is it possible in python polars to transform the root_names of expression meta data? E.g. if I have an expression like expr = pl.col("A").dot(pl.col("B")).alias("AdotB") ...
Max's user avatar
  • 700
1 vote
1 answer
663 views

Since Polars doesn't work with multi-index headers like Pandas does, I'd like to know if there's a native way to do the following: My current implementation has to go through Pandas first and then ...
Reveur's user avatar
  • 11
3 votes
2 answers
226 views

I’m doing a bunch of filters and other transform applications including a group_by on a polars data frame, the objective being to count the number of html tags in a single column per date per ...
Thomas Browne's user avatar
0 votes
2 answers
177 views

I have some big dataset and I need to do multiple successive joins that are slow. I figured an alternative was to unpivot the whole dataframe I was merging successfully, join once and then get the ...
AD AD's user avatar
  • 45
2 votes
2 answers
714 views

Using Python Polars, how can I modify the following script to stream the contents of a Parquet file as CSV text to standard output? import polars as pl import sys pl.scan_parquet("BTCUSDT-trades-...
Derek Mahar's user avatar
  • 28.5k
4 votes
2 answers
168 views

I'm considering a data analysis project involving information on Japanese TV broadcasts. The relevant data will include broadcast times, and some of those will be for programs that aired late at night....
Shay Guy's user avatar
  • 1,050
0 votes
1 answer
64 views

I have a polars dataframe which looks like this: shape: (2_655_541, 4) ┌────────────┬────────────┬─────────────────┬─────────────────────┐ │ streamflow ┆ sm_surface ┆ basin_id ┆ time ...
forestbat's user avatar
  • 1,115
1 vote
3 answers
122 views

I am trying to set multiple elements in a Polars DataFrame based on a condition, similar to how it is done in Pandas. Here’s an example in Pandas: import pandas as pd df = pd.DataFrame(dict( A=[1,...
HYRY's user avatar
  • 97.8k
2 votes
2 answers
112 views

I need to compute the percentage of positive values in the value column grouped by the group column. import polars as pl df = pl.DataFrame( { "group": ["A", "A&...
Andi's user avatar
  • 5,177
0 votes
0 answers
228 views

I often run into data fetching errors when I'm working in JupyterLab and trying to use polars instead of pandas as the dataframe library. I do this by running the statement %config SqlMagic.autopolars ...
N. Maks's user avatar
  • 686
3 votes
4 answers
913 views

In the code below I'm creating a polars- and a pandas dataframe with identical data. I want to select a set of rows based on a condition on column A, then update the corresponding rows for column C. I'...
rindis's user avatar
  • 1,159
2 votes
0 answers
78 views

Polars version 1.17.11 I have a json object with the following structure: json_obj = [ {"timestamp": "2024-10-01T21:23:23Z", "value": 31}, {"timestamp": ...
Danton Sá's user avatar
6 votes
1 answer
614 views

I am trying to conditionally format table data using Great Tables but not sure how to do it. To highlight the color of all those cells (sort of heatmap) whose values is higher than Upper Range column. ...
ViSa's user avatar
  • 2,357
1 vote
2 answers
183 views

This code not fill null values in column. I want to some fields to forward and backward fill nulls. import polars as pl df1 = pl.LazyFrame({ "dt": [ "...
Jan's user avatar
  • 586
0 votes
1 answer
103 views

I have a Polars DataFrame with 300 basins, each basin having 100,000 time records, and each time record consisting of 40 variables, totaling 30 million rows and 40 variables. How can I reconstruct it ...
forestbat's user avatar
  • 1,115
3 votes
1 answer
37 views

I sometimes apply numpy.finfo to a Pandas or a NumPy dtype – to determine the maximum support value (max) or the minimum meaningful increment (eps), say. Is there an equivalent for Polars dtypes? Or ...
Luca B.'s user avatar
  • 151
1 vote
1 answer
48 views

I find myself doing this: import polars as pl import sys red_data = pl.DataFrame( [ pl.Series("id", [0, 1, 2], dtype=pl.UInt8()), pl.Series("red_data", [1, 0, ...
bzm3r's user avatar
  • 4,664
-1 votes
1 answer
129 views

I have changed my data tool from xarray to polars in recent, and use pl.DataFrame.to_torch() to generate tensor for training my Pytorch model. Data source's format is parquet file. For avoiding fork ...
forestbat's user avatar
  • 1,115
0 votes
0 answers
64 views

We are currently implementing a calculation engine using Polars as backend. Given the characteristics of our data model, we chose to rely on a wide dataframe, where the variables contain the time ...
yz_jc's user avatar
  • 271

1
5 6
7
8 9
57