Skip to main content
Filter by
Sorted by
Tagged with
1 vote
0 answers
39 views

I am trying to explain() a Polars query to see which operations can be executed using the streaming engine. Currently, I am only able to do this using show_graph(). From sources on the web, I see that ...
gaut's user avatar
  • 6,038
1 vote
1 answer
59 views

I have string column in polars dataframe with multiple datetime formats and I am using following code to convert datatype of column from string into datetime. import polars as pl df = pl.from_dict({'...
dikesh's user avatar
  • 3,135
0 votes
0 answers
70 views

I have a Python file import polars as pl import requests from pathlib import Path url = "https://raw.githubusercontent.com/leanhdung1994/files/main/processedStep1_enwiktionary_namespace_0_43....
Akira's user avatar
  • 2,820
1 vote
3 answers
162 views

I have a .ndjson files with millions of rows. Each row has a field html which contains html strings. I would like to write all such html into a .txt file. One html is into one line of the .txt file. I ...
Akira's user avatar
  • 2,820
2 votes
1 answer
131 views

I am looking for the nearest non exact match on the dates column: import polars as pl df = pl.from_repr(""" ┌─────┬────────────┐ │ uid ┆ dates │ │ --- ┆ --- │ │ i64 ┆ date ...
rainerpf's user avatar
-2 votes
1 answer
80 views

I have a dictionary of polars.DataFrames called data_dict. All dataframes inside the dict values are having an extra index column ''. I want to drop that column and set a new column named 'name_ID' ...
Tudi72's user avatar
  • 31
2 votes
1 answer
76 views

Imagine having the following polars dataframe "df" that contains the temperature of a machine that is either "active" or "inactive": import polars as pl from datetime ...
the_economist's user avatar
1 vote
0 answers
76 views

I have a table that looks like this import polars as pl df = pl.DataFrame( { "col1": [1, 2, 3, 4, 5], "col2": [10, 20, 30, 40, 50], "col3": [...
Lethnis's user avatar
  • 31
Advice
0 votes
7 replies
114 views

I use the polars, urllib and tldextract packages in python to parse 2 columns of URL strings in zstd-compressed parquet files (averaging 8GB, 40 million rows). The parsed output include the scheme, ...
norcalpedaler's user avatar
12 votes
0 answers
326 views

It is a while that I am using Data Wrangler extension in VS Code; it is very useful for analyzing datasets and filtering some columns to see the features. When I opened a dataframe in it, it used to ...
Javad Faraji's user avatar
1 vote
1 answer
99 views

I've built a dataset in Polars (python), attempting to plot it as a stacked horizontal bar chart using Polars' built-in Altair plot function, however trying to specify a custom sort order for the ...
ExactaBox's user avatar
  • 3,425
1 vote
1 answer
109 views

Given two polars dataframes of the same shape, I would like to print the number of values different between the two, including missing values that are not missing in the other dataframe. I came up ...
robertspierre's user avatar
2 votes
2 answers
91 views

I have a CSV of energy consumption data over time (each month for several years). I want to determine the percentage (decimal portion) for each month across that year; e.g., August was 12.3% of the ...
Buckley's user avatar
  • 151
1 vote
3 answers
100 views

When you join two tables, STATA prints the number of rows merged and unmerged. For instance, take Example 1 at page 13 of the STATA merge doc: use https://www.stata-press.com/data/r19/autosize merge 1:...
robertspierre's user avatar
3 votes
0 answers
146 views

I noticed a significant performance deterioration when using polars dataframe join function after upgrading polars from 1.30.0 to 1.31.0. The code snippet is below: import polars as pl import time ...
Y. Gao's user avatar
  • 1,049
1 vote
3 answers
159 views

I'd like to replace any value greater than some condition with zero for any column except the date column in a df. The closest I've found it df.with_columns( pl.when(pl.any_horizontal(pl.col(pl....
thefrollickingnerd's user avatar
2 votes
1 answer
129 views

I have two Polars DataFrames (df1 and df2) with the same columns. I want to compare them by ID and Iname, and get the rows where any of the other columns (X, Y, Z) differ between the two. import ...
Simon's user avatar
  • 1,209
0 votes
0 answers
163 views

I'm working with a large Polars LazyFrame and computing rolling aggregations grouped by customer (Cusid). I need to find the "front" of the rolling window (last Tts_date) for each group to ...
Liisjak's user avatar
  • 37
6 votes
1 answer
106 views

I want to calculate the mean over some group column 'a' but include only one value per second group column 'b'. Constraints: I want to preserve all original records in the result. (if possible) avoid ...
gogodigi's user avatar
4 votes
3 answers
106 views

I would like to code a logger for polars using the Custom Namespace API. For instance, starting from: import logging import polars as pl penguins_pl = pl.read_csv("https://raw.githubusercontent....
robertspierre's user avatar
0 votes
1 answer
73 views

I am using tempfile with Polars for the first time and getting some surprising behavior when running it in a serverless Cloud Function-like environment. Here is my simple test code: try: with ...
starmandeluxe's user avatar
4 votes
4 answers
177 views

I have a Polars DataFrame with a column named "*" and would like to reference just that column. When I try to use pl.col("*") it is interpreted as a wildcard for "all columns.&...
Sam's user avatar
  • 359
1 vote
2 answers
84 views

If I have a DataFrame, I can create a column with a single value like this: df = pl.DataFrame([[1, 2, 3]]) df.with_columns(pl.lit("ok").alias("metadata")) shape: (3, 2) ┌──────────...
Ilya V. Schurov's user avatar
1 vote
0 answers
75 views

I'm wondering why I'm seeing such poor performance when writing a LazyFrame using PartitionByKey to S3 when compared to other methods. Here is a simple test script that writes out some random data to ...
Stephen's user avatar
  • 276
1 vote
2 answers
113 views

Preamble I'm using polars's write_excel method which has a parameter column_formats which wants a ColumnFormatDict that is defined here and below ColumnFormatDict: TypeAlias = Mapping[ # dict of ...
Dean MacGregor's user avatar
2 votes
0 answers
180 views

I'm following polars plugins tutorial - branch mispredictions and it says that theres a faster way to implement the following code: #[polars_expr(output_type=Int64)] fn sum_i64(inputs: &[Series]) -...
Ariana's user avatar
  • 29
-1 votes
1 answer
123 views

A Polars DataFrame that has 2 columns [Col01 & Col02]. They hold same values though not the same number of times [e.g. Col01 can have say 5 rows of '00000'while Col02 may have 20 rows of '00000' ...
Mohan Prasath's user avatar
8 votes
1 answer
256 views

I'm working with the narwhals package and I'm trying to write an expression that is: applied over groups using .over() Non-elementary/chained (longer than a single operation) Works when the native df ...
Slash's user avatar
  • 581
-2 votes
1 answer
127 views

Description Trying to read 32GB of data splitted in 16 .jsonl files. I use the function scan_ndjson of Polars but the execution stops with error 137 (Out of memory). Here is the code: # Count infobox ...
codug's user avatar
  • 27
3 votes
3 answers
159 views

I have a dataframe using this format import polars as pl df = pl.from_repr(""" ┌─────┬────────────┬────────────┬──────────┐ │ ID ┆ DATE_PREV ┆ DATE ┆ REV_DIFF │ │ --- ┆ --- ...
Philipp's user avatar
  • 65
2 votes
1 answer
87 views

While the standard Polars package is available in version 1.34.0 the polars-u64-idx package is missing the latest versions. Does anyone know if this package is discontinued?
Stefan Herrmann's user avatar
2 votes
2 answers
237 views

json_decode requires that we specify the dtype. Polars represents maps with arbitrary keys as a List<struct<2>> (see here). EDIT: Suppose I don't know the keys in my JSON ahead of time, ...
user31639176's user avatar
2 votes
1 answer
123 views

I have a very big parquet file which I'm attempting to read from and split into partitioned folders on a column "token". Currently I'm using pl.scan_parquet on the big parquet file followed ...
WillowOfTheBorder's user avatar
2 votes
3 answers
117 views

I have this dataframe: import polars as pl df = pl.DataFrame({'value': [1,2,3,4,5,None,None], 'flag': [0,1,1,1,0,0,0]}) ┌───────┬──────┐ │ value ┆ flag │ │ --- ┆ --- │ │ i64 ┆ i64 │ ╞═══════╪══...
Phil-ZXX's user avatar
  • 3,601
2 votes
1 answer
68 views

I am working to migrate from PySpark to Polars. In PySpark I often use aliases on dataframes so I can clearly see which columns come from which side of a join. I'd like to get similarly readable code ...
Arend-Jan Tissing's user avatar
0 votes
0 answers
113 views

I am using polars.df.write_delta() to initially create, and subsequently append to, Delta Tables in Microsoft Fabric OneLake storage, via a Fabric python notebook. Having had a production process up ...
Stuart J Cuthbertson's user avatar
1 vote
1 answer
97 views

I have an application where I have a futures::TryStream. Still in a streaming fashion, I want to convert this into a polars::LazyFrame. It is important to note that the TryStream comes from the ...
bmitc's user avatar
  • 908
0 votes
1 answer
117 views

Basically the title. Using PyCharm 2023.3.3 I'm not able to see the data of polars DataFrames. As an example, I've a simple DataFrame like this: print(ids_df) shape: (1, 4) ┌───────────────────────────...
Nauel's user avatar
  • 516
3 votes
3 answers
92 views

I have a simple dataframe look like this: import polars as pl df = pl.DataFrame({ 'ref': ['a', 'b', 'c', 'd', 'e', 'f'], 'idx': [4, 3, 1, 6, 2, 5], }) How can I obtain the result as ...
Baffin Chu's user avatar
2 votes
1 answer
104 views

I have this dataframe import polars as pl df = pl.from_repr(""" ┌────────────┬──────┐ │ date ┆ ME │ │ --- ┆ --- │ │ date ┆ i64 │ ╞════════════╪══════╡ │ 2027-11-...
Phil-ZXX's user avatar
  • 3,601
3 votes
0 answers
66 views

I am trying to repeat the values of a List in polars. The equivalent operation in pure python would be: [1,2,3,4] * 3 -> [1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4]. So the content of the list is repeated ...
ADI's user avatar
  • 31
0 votes
1 answer
96 views

I'm trying to extract some data from deeply nested JSON - this works: lf.with_columns( [ pl.coalesce( [ pl.col("a"), pl.col("...
dsully's user avatar
  • 598
0 votes
1 answer
117 views

I have a folder with multiple Excel files. I'm reading all of them in a single polars DataFrame concatenated vertically using globbing: import polars as pl df = pl.read_excel("folder/*.xlsx")...
robertspierre's user avatar
4 votes
2 answers
117 views

I would like to create a cross table that shows, in each cell, the percentages of rows over the total number of rows. Inspired by this post I started with: df = pl.DataFrame({"a": [2, 0, 1, ...
robertspierre's user avatar
3 votes
3 answers
183 views

I need to drop the first column in a polars DataFrame. I tried: result = df.select([col for idx, col in enumerate(df.columns) if idx != 0]) But it looks long and clumsy for such a simple task? I also ...
robertspierre's user avatar
1 vote
1 answer
121 views

I have a polars dataframe that I want to group by and concatenate the unique values in as a single entry. in pandas, I go: def unique_colun_values(x): return('|'.join(set(x))) dd=pd.DataFrame({'...
frank's user avatar
  • 3,816
4 votes
3 answers
121 views

Polars suggests the usage of Expressions to avoid eager execution and then execute all expressions together at the very end. I am unsure how this is possible if I want a column and a scalar. For ...
Felix Benning's user avatar
0 votes
4 answers
206 views

Is there a way for Polars to rename all columns, not just at the top level, but including multiple levels of nested structs? I need them to all be lowercase via str.lower
dsully's user avatar
  • 598
3 votes
1 answer
150 views

In polars, I would like to use pl.write_database multiple times with engine="adbc" in the same session and then commit all at the end with conn.commit(), i.e. do a manual commit. import ...
mouwsy's user avatar
  • 2,127
2 votes
1 answer
176 views

I am trying to import very large csv files into parquet files using polars. I stream data, use lazy dataframes and sinks. No problem until... ...sorting the dataframe on a column and removing ...
Matt's user avatar
  • 7,316

1
2 3 4 5
57