Newest 'python-polars' Questions - Page 5

2 votes

0 answers

189 views

Why does a Parquet file written with Polars query faster than one written with Spark?

I am writing Parquet files using two different frameworks—Apache Spark (Scala) and Polars (Python)—with the same schema and data. However, when I query the resulting Parquet files using Apache ...

user29976558

21

asked Mar 14 at 22:00

2 votes

1 answer

205 views

Polars group_by + describe: return all columns as single dataframe

I'm slowly migrating to polars from pandas and I have found that in some cases the polars syntax is tricky. I'm seeking help to do a group_by followed by a describe using less (or more readable) code. ...

jcaliz

4,073

asked Mar 14 at 16:44

5 votes

2 answers

227 views

Expanding polars dataframe with cartesian product of two columns [duplicate]

The code below shows a solution I have found in order to expand a dataframe to include the cartesian product of columns A and B, filling in the other columns with null values. I'm wondering if there ...

rindis

1,159

asked Mar 14 at 7:53

0 votes

3 answers

256 views

Sum of products of columns in polars

I have a dataset, part of which looks like this: customer product price quantity sale_time C060235 P0204 6.99 2 2024-03-11 08:24:11 C045298 P0167 14.99 1 2024-03-11 08:35:06 ... C039877 P0024 126.95 1 ...

Scott Deerwester

4,041

asked Mar 13 at 15:28

0 votes

0 answers

216 views

Values differ on multiple reads from parquet files using polars read_parquet but not with pandas read_parquet by workstation

I read data from the same parquet files multiple times using polars (polars rust engine and pyarrow) and using pandas pyarrow backend (not fastparquet as it was very slow), see below code. All the ...

newandlost

1,080

asked Mar 13 at 13:12

0 votes

1 answer

82 views

Efficient (and Safe) Way of Accessing Larger-than-Memory Datasets in Parallel

I am trying to use polars~=1.24.0 on Python 3.13 to process larger-than-memory sized datasets. Specifically, I am loading many (i.e., 35 of them) parquet files via the polars.scan_parquet('base-name-*....

Arda Aytekin

1,303

asked Mar 13 at 12:48

1 vote

1 answer

105 views

Polars upsampling with grouping does not behave as expected

Here is the data import polars as pl from datetime import datetime df = pl.DataFrame( { "time": [ datetime(2021, 2, 1), datetime(2021, 4, 2), ...

JohnRos

1,257

asked Mar 13 at 10:58

1 vote

1 answer

588 views

Why can I read a file (gVCF) with pandas but not with polars?

I have a CSV (or rather TSV) I got from stripping the header off a gVCF with bcftools view foo.g.vcf -H > foo.g.vcf.csv A head gives me this, so everything looks like expected so far chr1H 1 ...

skranz

65

asked Mar 13 at 9:14

0 votes

1 answer

81 views

Convert the values in a SQL array by SELECTing from another table?

I have a table "Data" containing arrays of FOOs, and a separate table "Lookup" where I can find the BAR for each FOO. I want to write a SQL query which returns the Data table, but ...

DarthVlader

375

asked Mar 13 at 0:07

2 votes

2 answers

155 views

Create a uniform dataset in Polars with cross joins

I am working with Polars and need to ensure that my dataset contains all possible combinations of unique values in certain index columns. If a combination is missing in the original data, it should be ...

Olibarer

423

asked Mar 12 at 16:46

2 votes

1 answer

100 views

Detect coordinate precision in polars floats?

I have some coordinate data; some of it high precision, some of it low precision thanks to multiple data sources and other operational realities. I want to have a column that indicates the relative ...

Kyle

1,012

asked Mar 12 at 15:09

1 vote

1 answer

63 views

Unexpected behaviour for numpy/polars correlation given large values

both for polars and numpy, correlation functions seem to break down given very large changes to the location. I presume that has to do with precision issues, as e.g. a bazillion +1 is viewed as equal ...

Dontwannausemynormalnick

51

asked Mar 11 at 22:32

1 vote

2 answers

379 views

Transforming polars Dataframe to Nested JSON Format

I have a dataframe that contains a product name, question, and answers. I would like to process the dataframe and transform it into a JSON format. Each product should have nested sections for ...

Simon

1,209

asked Mar 11 at 18:04

3 votes

2 answers

98 views

How to include first matching pattern as a column

I have a dataframe df. >>> import polars as pl >>> >>> >>> df = pl.DataFrame({"col": ["row1", "row2", "row3"]}) >>&...

user459872

25.9k

asked Mar 11 at 12:18

2 votes

1 answer

221 views

Is there a way to vertically merge two Polars LazyFrames?

I want to vertically merge two polars.LazyFrames in order to avoid collecting both LazyFrames beforehand, which is computationally expensive. I have tried extend(), concat(), and vstack() but none of ...

realbitsurfer

76

asked Mar 10 at 16:28

1 vote

2 answers

201 views

Cumulative Elementwise Sum by Python Polars

I have a weight vector: weight_vec = pl.Series("weights", [0.125, 0.0625, 0.03125]) And also a DataFrame containing up to m variables. For simplicity, we will only have two varaibles: df = ...

Kevin Li

649

asked Mar 9 at 0:33

2 votes

1 answer

202 views

Fill gaps in time series data in a Polars Lazy- / Dataframe

I am in a situation where I have some time series data, potentially looking like this: { "t": [1, 2, 5, 6, 7], "y": [1, 1, 1, 1, 1], } As you can see, the time stamp jumps ...

Thomas

1,351

asked Mar 7 at 12:25

1 vote

1 answer

147 views

SqlAlchemy Table Object Doesn't Synchronise with BigQuery

I'm reading data from Google BigQuery into a polars dataframe. Using a string query succeeds. I'd prefer to use an alchemy statement. Using python-bigquery-sqlalchemy provided by Google and following ...

eldrly

326

asked Mar 7 at 4:50

2 votes

2 answers

291 views

Join large partitioned parquet datasets in Polars and write to Postgres?

I have two large datasets stored in partitioned Parquet format on S3, partitioned by category_id. I need to join them on category_id and label_id using Polars and write the results to Postgres. The ...

Joost Döbken

4,147

asked Mar 6 at 12:13

2 votes

0 answers

221 views

python polars numerous joins crashing

This is for a POC to see if polars can do some things faster/better/cheaper than a current SQL solution. The first test case involves a count(*) over an eight-table join. The eight tables are ...

sicsmpr

55

asked Mar 5 at 20:46

1 vote

1 answer

72 views

How to add a new level to JSON output using Polars in Python?

I'm using Polars to process a DataFrame so I can save it as JSON. I know I can use the method .write_json(), however, I would like to add a new level to the JSON. My current approach: import polars as ...

Simon

1,209

asked Mar 5 at 15:46

5 votes

1 answer

354 views

Adding hours to a Polars time column

I have a table representing a schedule, i.e. it contains day (monday-sunday), start_time and end_time fields df = pl.DataFrame({ "day": ["monday", "tuesday", "...

David Waterworth

2,963

asked Mar 5 at 6:20

0 votes

2 answers

113 views

Polars lazy dataframe custom function over rows

I am trying to run a custom function on a lazy dataframe on a row-by-row basis. Function itself does not matter, so I'm using softmax as a stand-in. All that matters about it is that it is not ...

velochy

443

asked Mar 3 at 14:52

1 vote

0 answers

144 views

Using multithreading in Polars Expression Plugins

UPDATE: See this SO post where the streaming engine is used: How do I ensure that a Polars expression plugin properly uses multiple CPUs? Orginial post: I want to write a custom Polars Expression ...

thoooooooomas

93

asked Mar 3 at 10:24

0 votes

2 answers

90 views

Polars import from list of list into Colums [closed]

I am trying to import a list of lists into Polars and get the data in seperate columns. Example. numbers = [['304-144635', 0], ['123-091523', 7], ['305-144931', 12], ['623-101523', 16], ['305-145001', ...

diogenes

2,181

asked Mar 2 at 2:47

1 vote

2 answers

181 views

How can I perform a calculation on a rolling window over a partition in polars?

I have a Dataset containing GPS Coordinates of a few planes. I would like to calculate the bearing of each plane at every point in time. The Dataset as among others these columns: event_uid plane_no ...

jimfawkes

385

asked Feb 28 at 16:32

-2 votes

2 answers

459 views

How can I convert a Polars dataframe to a column-oriented JSON object? [closed]

I'm trying to convert a Polars dataframe to a JSON object, but I can't seem to find a way to change the format of it between row/col orientation. In Pandas, by default, it creates a column-oriented ...

Ghost

1,594

asked Feb 28 at 11:54

1 vote

1 answer

93 views

How to conditionally choose which column to backward fill over in polars?

I need to backfill a column over one of three possible columns, based on which one matches the non-null cell in the column to be backfilled. My dataframe looks something like this: import polars as pl ...

epistemetrica

127

asked Feb 28 at 1:48

1 vote

1 answer

119 views

Can polars have a boolean in a 'with_columns' statement?

I am using polars to hash some columns in a data set. One column is contains lists of strings and the other column strings. My approach is to cast each column as type string and then hash the columns....

MikeB2019x

1,297

asked Feb 27 at 19:19

4 votes

1 answer

112 views

Grouped Rolling Mean in Polars

Similar question is asked here However it didn't seem to work in my case. I have a dataframe with 3 columns, date, groups, prob. What I want is to create a 3 day rolling mean of the prob column values ...

AColoredReptile

327

asked Feb 26 at 17:08

2 votes

2 answers

238 views

How to add a group-specific index to a polars dataframe with an expression instead of a map_groups user-defined function?

I am curious whether I am missing something in the Polars Expression library in how this could be done more efficiently. I have a dataframe of protein sequences, where I would like to create k-long ...

Olga Botvinnik

1,694

asked Feb 26 at 16:25

2 votes

1 answer

79 views

polars cum sum to create a set and not actually sum

I'd like to use a function like cumsum, but that would create a set of all values contained in the column up to the point, and not to sum them df = pl.DataFrame({"a": [1, 2, 3, 4]}) df["...

ClementWalter

5,394

asked Feb 26 at 13:24

2 votes

1 answer

306 views

polars casting a list to string

I am working in Polars and I have data set where one column is lists of strings. To see what it's like: import pandas as pd list_of_lists = [['base', 'base.current base', 'base.current base....

MikeB2019x

1,297

asked Feb 25 at 19:49

3 votes

1 answer

383 views

How to full join / merge two frames with polars while updating left with right values?

So I got two csv which I load as polars frames: left: left_csv = b""" track_name,type,yield,group 8CEB45v1,corn,0.146957,A A188v2,corn,0.86308,A B73v6,corn,0.326076,A CI6621v1,sweetcorn,...

Pm740

423

asked Feb 24 at 17:51

3 votes

3 answers

526 views

Get column type using a Polars expression

I am trying to get the shrinked data type of a column using an expression, to be able to run validations against it. import polars as pl df = pl.DataFrame({"list_column": [[1, 2], [3, 4], [...

yz_jc

271

asked Feb 24 at 16:53

1 vote

1 answer

214 views

Polars interpolate_by fails when null values are at the beginning or end of a column

I've noticed some unexpected behavior with the interpolate_by expression and I'm not sure what is going on. import polars as pl df = pl.DataFrame({ 'a': [1, 2, 3, 4, 5], 'b': [4, 5, None, 7, ...

nybhh

101

asked Feb 23 at 18:30

0 votes

1 answer

137 views

How to convert float columns without decimal into int columns in Polars? [closed]

The following pandas code removes all the .0 decimal precision if I have a float column with 1.0, 2.0, 3.0 values: import pandas as pd df = pd.DataFrame({ "date": ["2025-01-01"...

Nyssance

401

asked Feb 23 at 12:49

4 votes

1 answer

101 views

How do I write a query like (A or B) and C in Polars?

I expected either a or b would be 0.0 (not NaN) and c would always be 0.0. The Polars documentation said to use | as "or" and & as "and". I believe I have the logic right: (((...

Steve Maguire

432

asked Feb 22 at 23:14

4 votes

1 answer

499 views

How can I iterate over all columns using pl.all() in Polars?

I've written a custom function in Polars to generate a horizontal forward/backward fill list of expressions. The function accepts an iterable of expressions (or column names) to determine the order of ...

Olibarer

423

asked Feb 22 at 15:59

2 votes

2 answers

94 views

Create column from other columns created within same `with_columns` context

Here, column "AB" is just being created and at the same time is being used as input to create column "ABC". This fails. df = df.with_columns( (pl.col("A")+pl.col("...

Nip

474

asked Feb 21 at 15:31

2 votes

1 answer

91 views

How to perform row aggregation across the largest x columns in a polars data frame?

I have a data frame with 6 value columns and I want to sum the largest 3 of them. I also want to create an ID matrix to identify which columns were included in the sum. So the initial data frame may ...

marinerbeck

23

asked Feb 21 at 15:21

1 vote

1 answer

154 views

How to instantiate a single element Array/List in Polars expressions efficiently?

I need to convert each element in a polars df into the following structure: { "value": "A", "lineItemName": "value", "dimensions": [ ...

Vinz

487

asked Feb 21 at 7:20

4 votes

3 answers

112 views

Python Polars Encoding Continous Variables from Breakpoints in another DataFrame

The breakpoints data is the following: breakpoints = pl.DataFrame( { "features": ["feature_0", "feature_0", "feature_1"], "breakpoints&...

Kevin Li

649

asked Feb 19 at 21:26

2 votes

1 answer

75 views

How to apply a custom function across multiple columns

How to extend this df = df.select( pl.col("x1").map_batches(custom_function).alias("new_x1") ) to something like df = df.select( pl.col("x1","x2")....

Nip

474

asked Feb 19 at 21:20

2 votes

1 answer

335 views

Airflow DAG gets stuck when filtering a Polars DataFrame

I am dynamically generating Airflow DAGs based on data from a Polars DataFrame. The DAG definition includes filtering this DataFrame at DAG creation time and again inside a task when the DAG runs. ...

elvainch

1,407

asked Feb 19 at 13:56

0 votes

1 answer

194 views

Handling Occasional 100 MB API Responses in FastAPI: Polars vs. NumPy/Pandas?

I'm working on an asynchronous FastAPI project that fetches large datasets from an API. Currently, I process the JSON response using a list comprehension and NumPy to extract device IDs and names. For ...

Foxbat

364

asked Feb 19 at 12:34

2 votes

1 answer

61 views

Null-aware Evaluation flawed in Polars 1.22.0?

I observed that the polars expression: pl.DataFrame(data={}).select(a=pl.lit(None) | pl.lit(True)) evaluates to True, but it should evaluate to None in my estimation, based on the concept of "...

Silverdust

1,527

asked Feb 19 at 9:45

3 votes

3 answers

725 views

Create new Polars columns by mapping values in a (delimited) string column using a dictionary

Sorry if the title is confusing. I'm pretty familiar with Pandas and think I have a solid idea of how I would do this there. Pretty much just brute-force iteration and index-based assignment for the ...

Sparky Parky

35

asked Feb 18 at 20:33

2 votes

2 answers

179 views

confused by silent truncation in polars type casting

I encountered some confusing behavior with polars type-casting (silently truncating floats to ints without raising an error, even when explicitly specifying strict=True), so I headed over to the ...

Max Power

9,146

asked Feb 18 at 16:55

0 votes

1 answer

310 views

Passing a polars struct to a user-defined function using map_batches

I need to pass a variable number of columns to a user-defined function. The docs mention to first create a pl.struct and subsequently let the function extract it. Here's the example given on the ...

Andi

5,177

asked Feb 18 at 8:11

Collectives™ on Stack Overflow