Newest 'python-polars' Questions

1 vote

0 answers

39 views

How to show the streaming parts of a polars query using explain()?

I am trying to explain() a Polars query to see which operations can be executed using the streaming engine. Currently, I am only able to do this using show_graph(). From sources on the web, I see that ...

gaut

6,038

asked yesterday

1 vote

1 answer

59 views

Polars parse multiple datetime format [duplicate]

I have string column in polars dataframe with multiple datetime formats and I am using following code to convert datatype of column from string into datetime. import polars as pl df = pl.from_dict({'...

dikesh

3,135

asked Nov 26 at 12:27

0 votes

0 answers

70 views

polars.LazyFrame.sink_csv does not give CRLF line termination [duplicate]

I have a Python file import polars as pl import requests from pathlib import Path url = "https://raw.githubusercontent.com/leanhdung1994/files/main/processedStep1_enwiktionary_namespace_0_43....

Akira

2,820

asked Nov 25 at 19:19

1 vote

3 answers

162 views

Polars: how to write a column of strings into a txt file without escaping?

I have a .ndjson files with millions of rows. Each row has a field html which contains html strings. I would like to write all such html into a .txt file. One html is into one line of the .txt file. I ...

Akira

2,820

asked Nov 25 at 0:08

2 votes

1 answer

131 views

Why does a nearest join_asof() return exact matches despite allow_exact_matches=False?

I am looking for the nearest non exact match on the dates column: import polars as pl df = pl.from_repr(""" ┌─────┬────────────┐ │ uid ┆ dates │ │ --- ┆ --- │ │ i64 ┆ date ...

rainerpf

21

asked Nov 21 at 20:58

-2 votes

1 answer

80 views

polars.exceptions.DuplicateError: column with name 'name_ID' has more than one occurrence [closed]

I have a dictionary of polars.DataFrames called data_dict. All dataframes inside the dict values are having an extra index column ''. I want to drop that column and set a new column named 'name_ID' ...

Tudi72

31

asked Nov 19 at 16:08

2 votes

1 answer

76 views

Change color of single line in altair line chart based on other indicator column

Imagine having the following polars dataframe "df" that contains the temperature of a machine that is either "active" or "inactive": import polars as pl from datetime ...

the_economist

579

asked Nov 17 at 9:32

1 vote

0 answers

76 views

Is it possible to drop/select columns where col.n_unique > 1 with native polars syntax [duplicate]

I have a table that looks like this import polars as pl df = pl.DataFrame( { "col1": [1, 2, 3, 4, 5], "col2": [10, 20, 30, 40, 50], "col3": [...

Lethnis

31

asked Nov 17 at 2:07

Advice

0 votes

7 replies

114 views

High volume URL parsing in Python

I use the polars, urllib and tldextract packages in python to parse 2 columns of URL strings in zstd-compressed parquet files (averaging 8GB, 40 million rows). The parsed output include the scheme, ...

norcalpedaler

132

asked Nov 16 at 18:34

12 votes

0 answers

326 views

Not displaying DataFrame's name in Data Wrangler extension of VSCode, displaying "Data grid"

It is a while that I am using Data Wrangler extension in VS Code; it is very useful for analyzing datasets and filtering some columns to see the features. When I opened a dataframe in it, it used to ...

Javad Faraji

41

asked Nov 16 at 8:02

1 vote

1 answer

99 views

Altair stacked bar chart in custom order

I've built a dataset in Polars (python), attempting to plot it as a stacked horizontal bar chart using Polars' built-in Altair plot function, however trying to specify a custom sort order for the ...

ExactaBox

3,425

asked Nov 14 at 20:46

1 vote

1 answer

109 views

Polars print changed values between 2 dataframes

Given two polars dataframes of the same shape, I would like to print the number of values different between the two, including missing values that are not missing in the other dataframe. I came up ...

robertspierre

5,386

asked Nov 13 at 16:52

2 votes

2 answers

91 views

Seeking more efficient method in Python & Polars to perform monthly comparison within each year

I have a CSV of energy consumption data over time (each month for several years). I want to determine the percentage (decimal portion) for each month across that year; e.g., August was 12.3% of the ...

Buckley

151

asked Nov 13 at 16:26

1 vote

3 answers

100 views

Show matched rows in polars join

When you join two tables, STATA prints the number of rows merged and unmerged. For instance, take Example 1 at page 13 of the STATA merge doc: use https://www.stata-press.com/data/r19/autosize merge 1:...

robertspierre

5,386

asked Nov 11 at 15:20

3 votes

0 answers

146 views

Why polars join function performance deteriorates so much from version 1.30.0 to 1.31.0?

I noticed a significant performance deterioration when using polars dataframe join function after upgrading polars from 1.30.0 to 1.31.0. The code snippet is below: import polars as pl import time ...

Y. Gao

1,049

asked Nov 7 at 13:14

1 vote

3 answers

159 views

Replace value by condition across entire polars df

I'd like to replace any value greater than some condition with zero for any column except the date column in a df. The closest I've found it df.with_columns( pl.when(pl.any_horizontal(pl.col(pl....

thefrollickingnerd

400

asked Nov 5 at 0:26

2 votes

1 answer

129 views

Find differing rows between two Polars DataFrames based on ID and multiple columns

I have two Polars DataFrames (df1 and df2) with the same columns. I want to compare them by ID and Iname, and get the rows where any of the other columns (X, Y, Z) differ between the two. import ...

Simon

1,209

asked Nov 4 at 19:06

0 votes

0 answers

163 views

How to efficiently get the last row of a rolling aggregation group without .last()?

I'm working with a large Polars LazyFrame and computing rolling aggregations grouped by customer (Cusid). I need to find the "front" of the rolling window (last Tts_date) for each group to ...

Liisjak

37

asked Nov 4 at 16:13

6 votes

1 answer

106 views

Polars streaming: How to compute a nested window aggregation while avoiding in-memory-maps?

I want to calculate the mean over some group column 'a' but include only one value per second group column 'b'. Constraints: I want to preserve all original records in the result. (if possible) avoid ...

gogodigi

95

asked Oct 31 at 11:16

4 votes

3 answers

106 views

Extending polars DataFrame while maintaining variables between calls

I would like to code a logger for polars using the Custom Namespace API. For instance, starting from: import logging import polars as pl penguins_pl = pl.read_csv("https://raw.githubusercontent....

robertspierre

5,386

asked Oct 31 at 9:19

0 votes

1 answer

73 views

Python tempfile TemporaryDirectory path changes multiple times after initialization

I am using tempfile with Polars for the first time and getting some surprising behavior when running it in a serverless Cloud Function-like environment. Here is my simple test code: try: with ...

starmandeluxe

2,597

asked Oct 31 at 4:42

4 votes

4 answers

177 views

Reference column named "*" in Polars

I have a Polars DataFrame with a column named "*" and would like to reference just that column. When I try to use pl.col("*") it is interpreted as a wildcard for "all columns.&...

Sam

359

asked Oct 29 at 21:56

1 vote

2 answers

84 views

Adding an Object column to a polars DataFrame with broadcasting

If I have a DataFrame, I can create a column with a single value like this: df = pl.DataFrame([[1, 2, 3]]) df.with_columns(pl.lit("ok").alias("metadata")) shape: (3, 2) ┌──────────...

Ilya V. Schurov

8,197

asked Oct 28 at 13:07

1 vote

0 answers

75 views

Polars LazyFrame sink_parquet + PartitionByKey slower to S3 than local disk

I'm wondering why I'm seeing such poor performance when writing a LazyFrame using PartitionByKey to S3 when compared to other methods. Here is a simple test script that writes out some random data to ...

Stephen

276

asked Oct 24 at 22:21

1 vote

2 answers

113 views

python typing distinctions between inline created parameters and variables

Preamble I'm using polars's write_excel method which has a parameter column_formats which wants a ColumnFormatDict that is defined here and below ColumnFormatDict: TypeAlias = Mapping[ # dict of ...

Dean MacGregor

20k

asked Oct 24 at 15:52

2 votes

0 answers

180 views

Speeding up Polars rust plugin branching and aggregating

I'm following polars plugins tutorial - branch mispredictions and it says that theres a faster way to implement the following code: #[polars_expr(output_type=Int64)] fn sum_i64(inputs: &[Series]) -...

Ariana

29

asked Oct 23 at 10:38

-1 votes

1 answer

123 views

Compare 2 columns in Polars and rearrange them when they match and unmatch?

A Polars DataFrame that has 2 columns [Col01 & Col02]. They hold same values though not the same number of times [e.g. Col01 can have say 5 rows of '00000'while Col02 may have 20 rows of '00000' ...

Mohan Prasath

1

asked Oct 17 at 13:57

8 votes

1 answer

256 views

How to write a pandas-compatible, non-elementary expression in narwhals

I'm working with the narwhals package and I'm trying to write an expression that is: applied over groups using .over() Non-elementary/chained (longer than a single operation) Works when the native df ...

Slash

581

asked Oct 14 at 19:07

-2 votes

1 answer

127 views

Polars scan_ndjson Out of memory

Description Trying to read 32GB of data splitted in 16 .jsonl files. I use the function scan_ndjson of Polars but the execution stops with error 137 (Out of memory). Here is the code: # Count infobox ...

codug

27

asked Oct 13 at 11:08

3 votes

3 answers

159 views

Calculating monthly revenue given start and end date for each ID using Polars

I have a dataframe using this format import polars as pl df = pl.from_repr(""" ┌─────┬────────────┬────────────┬──────────┐ │ ID ┆ DATE_PREV ┆ DATE ┆ REV_DIFF │ │ --- ┆ --- ...

Philipp

65

asked Oct 8 at 14:48

2 votes

1 answer

87 views

polars-u64-idx not available for latest version

While the standard Polars package is available in version 1.34.0 the polars-u64-idx package is missing the latest versions. Does anyone know if this package is discontinued?

Stefan Herrmann

81

asked Oct 7 at 10:03

2 votes

2 answers

237 views

How do I get polars.Expr.str.json_decode to decode simple map to List(Struct({'key': String, 'value': Int32}))?

json_decode requires that we specify the dtype. Polars represents maps with arbitrary keys as a List<struct<2>> (see here). EDIT: Suppose I don't know the keys in my JSON ahead of time, ...

user31639176

23

asked Oct 6 at 18:10

2 votes

1 answer

123 views

How to perform sinking lazyframes with diverging queries to different partitions

I have a very big parquet file which I'm attempting to read from and split into partitioned folders on a column "token". Currently I'm using pl.scan_parquet on the big parquet file followed ...

WillowOfTheBorder

45

asked Oct 6 at 12:44

2 votes

3 answers

117 views

Forward fill using values from rows that match a condition in Polars

I have this dataframe: import polars as pl df = pl.DataFrame({'value': [1,2,3,4,5,None,None], 'flag': [0,1,1,1,0,0,0]}) ┌───────┬──────┐ │ value ┆ flag │ │ --- ┆ --- │ │ i64 ┆ i64 │ ╞═══════╪══...

Phil-ZXX

3,601

asked Oct 2 at 10:25

2 votes

1 answer

68 views

How to select joined columns with structure like namespaces (a.col1, b.col2)?

I am working to migrate from PySpark to Polars. In PySpark I often use aliases on dataframes so I can clearly see which columns come from which side of a join. I'd like to get similarly readable code ...

Arend-Jan Tissing

376

asked Oct 2 at 10:07

0 votes

0 answers

113 views

Enabling Delta Table checkpointing when using polars write_delta()

I am using polars.df.write_delta() to initially create, and subsequently append to, Delta Tables in Microsoft Fabric OneLake storage, via a Fabric python notebook. Having had a production process up ...

Stuart J Cuthbertson

438

asked Sep 30 at 14:21

1 vote

1 answer

97 views

Converting a Rust `futures::TryStream` to a `polars::LazyFrame`

I have an application where I have a futures::TryStream. Still in a streaming fashion, I want to convert this into a polars::LazyFrame. It is important to note that the TryStream comes from the ...

bmitc

908

asked Sep 30 at 4:00

0 votes

1 answer

117 views

PyCharm "view as DataFrame" shows nothing for polars DataFrames

Basically the title. Using PyCharm 2023.3.3 I'm not able to see the data of polars DataFrames. As an example, I've a simple DataFrame like this: print(ids_df) shape: (1, 4) ┌───────────────────────────...

Nauel

516

asked Sep 29 at 9:56

3 votes

3 answers

92 views

Dynamically index a column in Polars

I have a simple dataframe look like this: import polars as pl df = pl.DataFrame({ 'ref': ['a', 'b', 'c', 'd', 'e', 'f'], 'idx': [4, 3, 1, 6, 2, 5], }) How can I obtain the result as ...

Baffin Chu

217

asked Sep 27 at 22:07

2 votes

1 answer

104 views

Find nearest / closest value to subset of values in a Polars dataframe

I have this dataframe import polars as pl df = pl.from_repr(""" ┌────────────┬──────┐ │ date ┆ ME │ │ --- ┆ --- │ │ date ┆ i64 │ ╞════════════╪══════╡ │ 2027-11-...

Phil-ZXX

3,601

asked Sep 26 at 15:47

3 votes

0 answers

66 views

How to repeat List in Polars [duplicate]

I am trying to repeat the values of a List in polars. The equivalent operation in pure python would be: [1,2,3,4] * 3 -> [1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4]. So the content of the list is repeated ...

ADI

31

asked Sep 25 at 14:04

0 votes

1 answer

96 views

How to extract & coalesce deeply nested values that may not exist? [closed]

I'm trying to extract some data from deeply nested JSON - this works: lf.with_columns( [ pl.coalesce( [ pl.col("a"), pl.col("...

dsully

598

asked Sep 25 at 4:02

0 votes

1 answer

117 views

Show progress bar when reading files with globbing with polars

I have a folder with multiple Excel files. I'm reading all of them in a single polars DataFrame concatenated vertically using globbing: import polars as pl df = pl.read_excel("folder/*.xlsx")...

robertspierre

5,386

asked Sep 23 at 3:18

4 votes

2 answers

117 views

How to create a cross table with percentages in Polars?

I would like to create a cross table that shows, in each cell, the percentages of rows over the total number of rows. Inspired by this post I started with: df = pl.DataFrame({"a": [2, 0, 1, ...

robertspierre

5,386

asked Sep 17 at 3:36

3 votes

3 answers

183 views

Drop column by index in polars

I need to drop the first column in a polars DataFrame. I tried: result = df.select([col for idx, col in enumerate(df.columns) if idx != 0]) But it looks long and clumsy for such a simple task? I also ...

robertspierre

5,386

asked Sep 16 at 18:43

1 vote

1 answer

121 views

group_by with polars concatenating values

I have a polars dataframe that I want to group by and concatenate the unique values in as a single entry. in pandas, I go: def unique_colun_values(x): return('|'.join(set(x))) dd=pd.DataFrame({'...

frank

3,816

asked Sep 16 at 9:16

4 votes

3 answers

121 views

How can I efficiently get both a column and a scalar using Polars expressions?

Polars suggests the usage of Expressions to avoid eager execution and then execute all expressions together at the very end. I am unsure how this is possible if I want a column and a scalar. For ...

Felix Benning

1,383

asked Sep 15 at 14:32

0 votes

4 answers

206 views

Recursively rename all column names and nested struct fields to lowercase in a Polars DataFrame? [closed]

Is there a way for Polars to rename all columns, not just at the top level, but including multiple levels of nested structs? I need them to all be lowercase via str.lower

dsully

598

asked Sep 14 at 18:15

3 votes

1 answer

150 views

write_database(..., engine="adbc") with autocommit=False

In polars, I would like to use pl.write_database multiple times with engine="adbc" in the same session and then commit all at the end with conn.commit(), i.e. do a manual commit. import ...

mouwsy

2,127

asked Sep 10 at 21:04

2 votes

1 answer

176 views

Memory efficient sorting/removing duplicates of polars dataframes

I am trying to import very large csv files into parquet files using polars. I stream data, use lazy dataframes and sinks. No problem until... ...sorting the dataframe on a column and removing ...

Matt

7,316

asked Sep 10 at 13:18

Collectives™ on Stack Overflow