Newest 'python-polars' Questions - Page 7

2 votes

3 answers

190 views

Polars - Replace letter in string with uppercase letter

Is there any way in polars to replace character just after the _ with uppercase using regex replace? So far I have achieved it using polars.Expr.map_elements. Is there any alternative using native ...

dikesh

3,135

asked Jan 15 at 14:22

2 votes

2 answers

109 views

get value from current row in rolling window

Given the following data structure import polars as pl df = pl.DataFrame( { "order_id": ["o01", "o02", "o03", "o04", "o10", &...

dpprdan

1,817

asked Jan 15 at 12:26

1 vote

0 answers

42 views

Take unique values horizontally across a Polars DataFrame to create a new string column [duplicate]

I have this dataframe: import polars as pl df = pl.from_repr("""shape: (4, 3) ┌──────┬──────┐ │ ccy1 ┆ ccy2 │ │ --- ┆ --- │ │ str ┆ str │ ╞══════╪══════╡ │ USD ┆ USD │ │ EUR ┆ ...

Phil-ZXX

3,601

asked Jan 15 at 11:38

0 votes

1 answer

119 views

Polars write_excel: rotate some header columns

When using pl.write_excel, I am looking for a possibility to rotate SOME header columns by 90°. I am applying a bunch of input arguments provided by pl.write_excel in order to style the exported ...

Andi

5,177

asked Jan 15 at 9:24

4 votes

4 answers

93 views

How to extinguish cycle in my code when calculating EMWA？

I'm calculating EWMA values for array of streamflow, and code is like below: import polars as pl import numpy as np streamflow_data = np.arange(0, 20, 1) adaptive_alphas = np.concatenate([np.repeat(0....

forestbat

1,115

asked Jan 14 at 17:39

5 votes

1 answer

469 views

How to get the day / month name of a column in polars

I have a polars dataframe df which has a datetime column date. I'm trying to get the name of the day and month of that column. Consider the following example. import polars as pl from datetime import ...

Simon

1,209

asked Jan 14 at 14:39

3 votes

1 answer

385 views

Polars Schema: TypeError: dtypes must be fully-specified, got: Datetime

Hi I want to define a polars schema. It works fine without a datetime format. However it fails with pl.Datetime. import polars as pl testing_schema: pl.Schema = pl.Schema( { "date&...

SysRIP

491

asked Jan 14 at 8:43

1 vote

0 answers

33 views

Add a milliseconds since midnight integer column to a datetime in Polars? [duplicate]

I have a Polars data frame in the following format: import polars as pl df = pl.from_repr(""" ┌───────────┬──────────┐ │ ms_of_day ┆ date │ │ --- ┆ --- │ │ i64 ┆ ...

nybhh

101

asked Jan 12 at 20:24

4 votes

2 answers

506 views

Check if all values of Polars DataFrame are True

How can I check if all values of a polars DataFrame, containing only boolean columns, are True? Example df: df = pl.DataFrame({"a": [True, True, None], "b": [...

mouwsy

2,127

asked Jan 12 at 18:07

1 vote

1 answer

377 views

Polars runs out of memory when collecting a JSON file

We want to use Polars to load a JSON file of 22GB (10M rows and 65 columns) but we're running out of memory when run collect() which is causing the program to crash. We're using pl.scan_ndjson to load ...

n4gash

23

asked Jan 9 at 10:08

2 votes

1 answer

145 views

Python-Polars: Expression list product

In Python-Polars, it is easy to calculate the Sum of all the lists in an array with polars.Expr.list.sum. See the example below for the sum: df = pl.DataFrame({"values": [[[1]], [[2, 3], [5,...

yz_jc

271

asked Jan 8 at 18:51

1 vote

0 answers

324 views

TypeError: argument 'schema': 'Object' is not a Polars data type

Why? I am querying data from a MongoDB collection and loading the result into a Polars DataFrame. Depending on the limit filter of the mongo query the operation works or raises the error of the title. ...

Santiago Noacco

81

asked Jan 8 at 16:04

1 vote

1 answer

583 views

Explicit cast of a lazy frame not possible with type mismatch?

I've only been using polars for a few months now (coming from pandas) so forgive me if I'm interpreting things wrong :) I want to read many parquet files, merge them into a single dataframe and then ...

Droid

673

asked Jan 8 at 10:26

2 votes

1 answer

84 views

Python-Polars: Cross field calculation of struct columns

I am trying to buld a function that takes a list of struct columns, extracts two fields, and perform a cross-field combination of all the values of such fields. Everything in the same context. For ...

yz_jc

271

asked Jan 7 at 17:06

3 votes

1 answer

396 views

Python polars: pass named row to pl.DataFrame.map_rows

I'm looking for a way to apply a user defined function taking a dictionary, and not a tuple, of arguments as input when using pl.DataFrame.map_rows. Trying something like df.map_rows(lambda x: udf({k:...

paulduf

320

asked Jan 7 at 15:02

1 vote

1 answer

87 views

Transpose dataframe with List elements

I have a dataframe like ┌─────┬────────┬───────┬───────┬───────┬───────┬───────┬───────┬───────┬───────┬───┬───────┬───────┬───────┬───────┬───────┬───────┬───────┬───────┬───────┬───────┐ │ rul ┆ ...

Glenn Pierce

972

asked Jan 7 at 10:25

1 vote

0 answers

565 views

How to type Polars' Series in Python?

I'm trying to type my functions in Python for polars.Series objects of a specific dtype. For instance, in a MWE, a function could look like: import typing as tp import polars as pl u = pl.Series(...

globglogabgalab

628

asked Jan 7 at 9:53

0 votes

0 answers

69 views

Custom Expression returns list[f64] instead of f64 when using group_by_dynamic()

When using group_by_dynamic() to perform a rolling calculation, my custom geometric mean expression will return a list[f64] dtype for each value instead of a f64. However, when performing the ...

Trevor Seibert

129

asked Jan 6 at 19:07

0 votes

2 answers

144 views

How to speed up the operation of repeating take first n rows for each group after group_by?

The df contains 100 millions of rows, and group_by columns is like 25-30. Is there a way to speed this operation up from here? or this is the best I can get. import polars as pl import numpy as np ...

user28199045

13

asked Jan 6 at 3:24

1 vote

1 answer

125 views

Mutate polars column and keep original column name on custom expression

I trying to implement a custom expression in Rust polars to calculate the geomean of different columns, essentailly replicating the same behavior of .mean() expression where it will apply the ...

Trevor Seibert

129

asked Jan 5 at 22:39

1 vote

1 answer

132 views

DeltaTable map type

Using Spark, I can create a delta table with a map column type: MAP<STRING, TIMESTAMP> How do I create a delta table with a map type without Spark? I have tried multiple approaches and none of ...

Frank

634

asked Jan 5 at 1:28

2 votes

1 answer

83 views

Find nearest following row with values greater than or equal to current row

Starting with this DataFrame: import polars as pl df_1 = pl.DataFrame({ 'name': ['Alpha', 'Alpha', 'Alpha', 'Alpha', 'Alpha'], 'index': [0, 3, 4, 7, 9], 'limit': [12, 18, 11, 5, 9], '...

Danilo Setton

705

asked Jan 4 at 19:58

0 votes

1 answer

232 views

Polars is killing the kernel on import

I am running the following code on JupyterLab, with no other notebooks open: !pip3 install polars --upgrade import polars as pl The first line upgrades me to polars 1.18.0 with no issues, but then ...

mmyoung77

1,447

asked Jan 3 at 19:41

0 votes

0 answers

121 views

how do I find all polars dataframes in python

I have a long script in python, predominantly pandas, but shifting to polars. I am reviewing memory of items. To find 10 largest objects currently in use locals().items() and sys.getsizeof(), I run: ...

frank

3,816

asked Jan 2 at 13:57

2 votes

1 answer

90 views

How to combine columns with extra strings into a concatenated string column in Polars?

I am trying to add another column that will contain combination of two columns (Total & percentage) into a result column(labels_value) which look like: (Total) percentage%. Basically to wrap ...

ViSa

2,357

asked Jan 2 at 8:56

3 votes

2 answers

249 views

DuplicateError with name 'null' when trying to pivot a Polars DataFrame

I have this example dataframe in polars: import polars as pl df_example = pl.DataFrame( { "DATE": ["2024-11-11", "2024-11-11", "2024-11-12", "...

Olivier_s_j

5,232

asked Dec 31, 2024 at 7:44

0 votes

1 answer

381 views

Force schema type using Polars scan/sink csv

I have a large number of CSV files (~100,000) some of which themselves are large CSV files (i.e., >128G) and I am trying to convert them to Parquet files. The files contain a mix of character, ...

user1805103

129

asked Dec 28, 2024 at 18:54

2 votes

1 answer

106 views

Create a new Polars column from a multiple choice of expressions by mapping values to a dictionary

I want to use an expression dictionary to perform calculations for a new column. I have this Polars dataframe: import polars as pl df = pl.DataFrame({ "col1": ["a", "b&...

Babak Fi Foo

1,078

asked Dec 28, 2024 at 18:10

6 votes

2 answers

424 views

asof-join with multiple inequality conditions

I have two dataframes: a (~600M rows) and b (~2M rows). What is the best approach for joining b onto a, when using 1 equality condition and 2 inequality conditions on the respective columns? a_1 = ...

usdn

402

asked Dec 28, 2024 at 2:24

1 vote

1 answer

115 views

Set column names using values from a specific row in Polars

I am bringing in the data from an Excel spreadsheet. I want to make all the info from df.row(8) into the column header names. In pandas it was just: c = [ 'A', 'B', 'C', 'D', 'E', 'F' ] df.columns = c ...

diogenes

2,181

asked Dec 27, 2024 at 12:19

1 vote

0 answers

109 views

Save intermediate results for big polars lazyframe processing?

The issue may be related to https://github.com/pola-rs/polars/issues/9842 and How to process Python Polars LazyFrame in batches My setup is input = pathlib.Path("input.csv") # 300k lines ...

JRX

21

asked Dec 27, 2024 at 10:15

2 votes

1 answer

62 views

Polars transform meta data of expressions

Is it possible in python polars to transform the root_names of expression meta data? E.g. if I have an expression like expr = pl.col("A").dot(pl.col("B")).alias("AdotB") ...

Max

700

asked Dec 23, 2024 at 21:11

1 vote

1 answer

663 views

Select multiple rows and use as headers with separator in Polars

Since Polars doesn't work with multi-index headers like Pandas does, I'd like to know if there's a native way to do the following: My current implementation has to go through Pandas first and then ...

Reveur

11

asked Dec 23, 2024 at 17:55

3 votes

2 answers

226 views

How do you insert a map-reduce into a Polars method chain?

I’m doing a bunch of filters and other transform applications including a group_by on a polars data frame, the objective being to count the number of html tags in a single column per date per ...

Thomas Browne

25.1k

asked Dec 23, 2024 at 10:40

0 votes

2 answers

177 views

Polars faster alternative to successive joins

I have some big dataset and I need to do multiple successive joins that are slow. I figured an alternative was to unpivot the whole dataframe I was merging successfully, join once and then get the ...

AD AD

45

asked Dec 21, 2024 at 13:58

2 votes

2 answers

714 views

How can I use Polars to stream the contents of a Parquet file as CSV text to standard output?

Using Python Polars, how can I modify the following script to stream the contents of a Parquet file as CSV text to standard output? import polars as pl import sys pl.scan_parquet("BTCUSDT-trades-...

Derek Mahar

28.5k

asked Dec 20, 2024 at 23:39

4 votes

2 answers

168 views

How should I parse times in the Japanese "30-hour" format for data analysis? [closed]

I'm considering a data analysis project involving information on Japanese TV broadcasts. The relevant data will include broadcast times, and some of those will be for programs that aired late at night....

Shay Guy

1,050

asked Dec 20, 2024 at 15:52

0 votes

1 answer

64 views

Why slice expression don't get correct indexes in polars DataFrame?

I have a polars dataframe which looks like this: shape: (2_655_541, 4) ┌────────────┬────────────┬─────────────────┬─────────────────────┐ │ streamflow ┆ sm_surface ┆ basin_id ┆ time ...

forestbat

1,115

asked Dec 19, 2024 at 9:56

1 vote

3 answers

122 views

How to set multiple elements conditionally in Polars similar to .loc in Pandas?

I am trying to set multiple elements in a Polars DataFrame based on a condition, similar to how it is done in Pandas. Here’s an example in Pandas: import pandas as pd df = pd.DataFrame(dict( A=[1,...

HYRY

97.8k

asked Dec 19, 2024 at 3:44

2 votes

2 answers

112 views

Compute percentage of positive rows in a group_by polars DataFrame

I need to compute the percentage of positive values in the value column grouped by the group column. import polars as pl df = pl.DataFrame( { "group": ["A", "A&...

Andi

5,177

asked Dec 17, 2024 at 10:34

0 votes

0 answers

228 views

python polars in jupyter lab leads to error due to infer_schema_legth

I often run into data fetching errors when I'm working in JupyterLab and trying to use polars instead of pandas as the dataframe library. I do this by running the statement %config SqlMagic.autopolars ...

N. Maks

686

asked Dec 17, 2024 at 8:57

3 votes

4 answers

913 views

Setting slice of column to list of values on polars dataframe

In the code below I'm creating a polars- and a pandas dataframe with identical data. I want to select a set of rows based on a condition on column A, then update the corresponding rows for column C. I'...

rindis

1,159

asked Dec 16, 2024 at 14:48

2 votes

0 answers

78 views

Casting string column to pl.Datetime does not keep timezone information while str.to_datetime does

Polars version 1.17.11 I have a json object with the following structure: json_obj = [ {"timestamp": "2024-10-01T21:23:23Z", "value": 31}, {"timestamp": ...

Danton Sá

33

asked Dec 15, 2024 at 23:41

6 votes

1 answer

614 views

How to conditionally format data in Great Tables? [duplicate]

I am trying to conditionally format table data using Great Tables but not sure how to do it. To highlight the color of all those cells (sort of heatmap) whose values is higher than Upper Range column. ...

ViSa

2,357

asked Dec 13, 2024 at 17:53

1 vote

2 answers

183 views

How to forward / backward fill null fields in a struct column using Polars?

This code not fill null values in column. I want to some fields to forward and backward fill nulls. import polars as pl df1 = pl.LazyFrame({ "dt": [ "...

Jan

586

asked Dec 13, 2024 at 12:36

0 votes

1 answer

103 views

How to convert Polars dataframe to numpy array which has certain dims？

I have a Polars DataFrame with 300 basins, each basin having 100,000 time records, and each time record consisting of 40 variables, totaling 30 million rows and 40 variables. How can I reconstruct it ...

forestbat

1,115

asked Dec 13, 2024 at 11:07

3 votes

1 answer

37 views

How to apply `numpy.finfo` to Polars types?

I sometimes apply numpy.finfo to a Pandas or a NumPy dtype – to determine the maximum support value (max) or the minimum meaningful increment (eps), say. Is there an equivalent for Polars dtypes? Or ...

Luca B.

151

asked Dec 12, 2024 at 16:47

1 vote

1 answer

48 views

Joining two dataframes that share "index columns" (id columns), but not data columns, so that the resulting dataframe has a full spine of ids?

I find myself doing this: import polars as pl import sys red_data = pl.DataFrame( [ pl.Series("id", [0, 1, 2], dtype=pl.UInt8()), pl.Series("red_data", [1, 0, ...

bzm3r

4,664

asked Dec 12, 2024 at 16:28

-1 votes

1 answer

129 views

Why there is 'Unpickling Error' when using polars to read data for pytorch?

I have changed my data tool from xarray to polars in recent, and use pl.DataFrame.to_torch() to generate tensor for training my Pytorch model. Data source's format is parquet file. For avoiding fork ...

forestbat

1,115

asked Dec 12, 2024 at 15:34

0 votes

0 answers

64 views

Python-Polars: Performance of wide dataframe

We are currently implementing a calculation engine using Polars as backend. Given the characteristics of our data model, we chose to rely on a wide dataframe, where the variables contain the time ...

yz_jc

271

asked Dec 11, 2024 at 17:09

Collectives™ on Stack Overflow