Highest scored 'python-polars' questions

48 votes

2 answers

43k views

What is the equivalent of `DataFrame.drop_duplicates()` from pandas in polars?

What is the equivalent of drop_duplicates() from pandas in polars? import polars as pl df = pl.DataFrame({"a":[1,1,2], "b":[2,2,3], "c":[1,2,3]}) df Output: shape: (...

keiv.fly

4,240

asked Feb 20, 2022 at 16:57

30 votes

5 answers

30k views

Polars: How to reorder columns in a specific order?

I cannot find how to reorder columns in a polars dataframe in the polars DataFrame docs.

rchitect-of-info

1,652

asked Mar 4, 2022 at 14:48

28 votes

2 answers

28k views

How to add a column to a polars DataFrame using .with_columns() [duplicate]

I am currently creating a new column in a polars data frame using predictions = [10, 20, 30, 40, 50] df['predictions'] = predictions where predictions is a numpy array or list containing values I ...

Felix.B

736

asked Aug 1, 2022 at 11:33

26 votes

1 answer

51k views

Easily convert string column to pl.datetime in Polars

Consider a Polars data frame with a column of str type that indicates the date in the format '27 July 2020'. I would like to convert this column to the polars.datetime type, which is distinct from the ...

fabioklr

670

asked Apr 5, 2022 at 23:00

25 votes

2 answers

24k views

Polars: Create column with fixed value from variable [duplicate]

I have scrubbed the polars docs and cannot see an example of creating a column with a fixed value from a variable. Here is what works in pandas: df['VERSION'] = version Thx

rchitect-of-info

1,652

asked Mar 3, 2022 at 16:10

25 votes

2 answers

34k views

Polars: How to filter using 'in' and 'not in' like in SQL

How can I achieve the equivalents of SQL's IN and NOT IN? I have a list with the required values. Here's the scenario: import pandas as pd import polars as pl exclude_fruit = ["apple", "...

Daycent

615

asked Apr 12, 2022 at 23:01

25 votes

6 answers

29k views

In Polars how do I print all elements of a list column?

I have a Polars DataFrame with a list column. I want to control how many elements of a pl.List column are printed. I've tried pl.pl.Config.set_fmt_str_lengths() but this only restricts the number of ...

braaannigan

934

asked Nov 3, 2022 at 9:55

23 votes

3 answers

30k views

Polars looping through the rows in a dataset

I am trying to loop through a Polars recordset using the following code: import polars as pl df = pl.DataFrame({ "start_date": ["2020-01-02", "2020-01-03", "...

John Smith

2,956

asked Feb 2, 2023 at 13:15

23 votes

1 answer

38k views

Python Polars: How to get the row count of a LazyFrame?

The CSV file I have is 70 Gb in size. I want to load the DF and count the number of rows, in lazy mode. What's the best way to do so? As far as I can tell, there is no function like shape in lazy mode ...

roei shlezinger

353

asked Feb 21, 2023 at 16:48

22 votes

1 answer

24k views

How to use group_by and apply a custom function with Polars?

I am breaking my head trying to figure out how to use group_by and apply a custom function using Polars. Coming from Pandas, I was using: import polars as pl import pandas as pd from scipy.stats ...

jbssm

7,181

asked Oct 14, 2021 at 18:12

20 votes

2 answers

55k views

Add a single string value as a new column to polars DataFrame [duplicate]

Being a new user to polars coming from pandas, I have searched polars GitHub pages, user guide, stackoverflow and discord channel on how to add a new column to a polars dataframe. I have only found ...

eoia

203

asked Mar 13, 2023 at 9:26

20 votes

1 answer

27k views

Extract value of Polars literal

If I have a Polars literal, how can I extract the value? import polars as pl expr = pl.lit(0.5) val = float(expr) # TypeError: float() argument must be a string or a real number, not 'Expr'

drhagen

9,852

asked Apr 2, 2022 at 22:03

19 votes

5 answers

19k views

How to transform Spark dataframe to Polars dataframe?

I wonder how i can transform Spark dataframe to Polars dataframe. Let's say i have this code on PySpark: df = spark.sql('''select * from tmp''') I can easily transform it to pandas dataframe using ....

s1nbad

193

asked Aug 2, 2022 at 7:08

18 votes

6 answers

19k views

Pandas REPLACE equivalent in Polars

Is there an elegant way how to recode values in polars dataframe. For example 1->0, 2->0, 3->1... in Pandas it is simple like that: df.replace([1,2,3,4,97,98,99],[0,0,1,1,2,2,2])

zenelb

201

asked Feb 3, 2022 at 9:29

18 votes

2 answers

30k views

Polars: Specify dtypes for all columns at once in read_csv

In Polars, how can one specify a single dtype for all columns in read_csv? According to the docs, the schema_overrides argument to read_csv can take either a mapping (dict) in the form of {'...

daviewales

2,889

asked Feb 14, 2022 at 2:55

18 votes

6 answers

2k views

pandas or Polars: find index of previous element larger than current one

Suppose my data looks like this: data = { 'value': [1,9,6,7,3, 2,4,5,1,9] } For each row, I would like to find the row number of the latest previous element larger than the current one. So, my ...

ignoring_gravity

11.1k

asked Feb 25, 2024 at 17:22

18 votes

1 answer

21k views

How to apply a custom function in Polars that does the processing row by row?

I want to pass each row of a Polars DataFrame into a custom function. def my_complicated_function(row): # ... return result df = pl.DataFrame({ "foo": [1, 2, 3], &...

Pradeepgb

181

asked Mar 29, 2022 at 8:42

18 votes

5 answers

9k views

Is there a json_normalize like feature in Polars?

I went through the entire documentation of Polars but couldn't find anything which could convert nested json into dataframe. test = { "name": "Ravi", "Subjects": { ...

Shikha Sheoran

181

asked Nov 21, 2021 at 4:10

17 votes

2 answers

33k views

How can I append or concatenate two dataframes in python polars?

I see it's possible to append using the series namespace (https://stackoverflow.com/a/70599059/5363883). What I'm wondering is if there is a similar method for appending or concatenating DataFrames. ...

cnpryer

405

asked Mar 28, 2022 at 23:57

17 votes

2 answers

19k views

Polars DataFrame memory size in Python

Was wondering about the size of particular polars DataFrames. I tried with: from sys import getsizeof getsizeof(df) Out[17]: 48 getsizeof(df.to_pandas()) Out[18]: 1602923950 It appears all polars df ...

fvg

273

asked Apr 7, 2022 at 21:02

17 votes

2 answers

21k views

Idiomatic replacement of empty string with null in Polars

I have a polars DataFrame with a number of Series that look like: pl.Series(['cow', 'cat', '', 'lobster', '']) # Series: '' [str] # [ # "cow" # "cat" # "" # &...

user6268172

asked May 18, 2022 at 15:32

16 votes

7 answers

20k views

Split a string column into many columns by delimiter in Polars

In pandas, the following code will split the string from col1 into many columns. is there a way to do this in polars? data = {"col1": ["a/b/c/d", "a/b/c/d"]} df = pl....

user17260574

asked Sep 13, 2022 at 8:04

16 votes

3 answers

22k views

Polars: change a value in a dataframe if a condition is met in another column

I have this dataframe import polars as pl df = pl.from_repr(""" ┌─────┬───────┐ │ one ┆ two │ │ --- ┆ --- │ │ str ┆ str │ ╞═════╪═══════╡ │ a ┆ hola │ │ b ┆ world │ └─────┴──...

user18140022

405

asked Apr 11, 2023 at 10:43

16 votes

1 answer

11k views

Add a new Polars column from a single value?

In pandas, we can just assign directly: import pandas as pd import polars as pl df = pl.DataFrame({"a": [1, 2]}) df_pd = df.to_pandas() # add a single value df_pd["b"] = 3 # ...

lemmingxuan

729

asked May 15, 2022 at 2:53

16 votes

3 answers

14k views

How to use Polars with Plotly without converting to Pandas?

I would like to replace Pandas with Polars but I was not able to find out how to use Polars with Plotly without converting to Pandas. I wonder if there is a way to completely cut Pandas out of the ...

fabioklr

670

asked Apr 4, 2022 at 13:09

15 votes

5 answers

15k views

Mapping a Python dict to a Polars series

In Pandas we can use the map function to map a dict to a series to create another series with the mapped values. More generally speaking, I believe it invokes the index operator of the argument, i.e. [...

T.H Rice

317

asked Dec 13, 2022 at 3:24

15 votes

2 answers

12k views

Retrieve date from datetime column in polars

Currently when I try to retrieve date from a polars datetime column, I have to write something similar to: import polars as pl import datetime as dt df = pl.DataFrame({ 'time': [dt.datetime.now()]...

Alex

609

asked Aug 2, 2022 at 19:12

15 votes

4 answers

9k views

Sample from each group in polars dataframe?

I'm looking for a function along the lines of df.group_by('column').agg(sample(10)) so that I can take ten or so randomly-selected elements from each group. This is specifically so I can read in a ...

user6268172

asked Jun 15, 2022 at 14:45

15 votes

2 answers

10k views

How to convert time durations to numeric in polars?

Is there any built-in function in polars or a better way to convert time durations to numeric by defining the time resolution (e.g.: days, hours, minutes)? import polars as pl df = pl.DataFrame({ ...

Guz

477

asked Feb 13, 2023 at 15:46

14 votes

3 answers

30k views

Switching between dtypes within a DataFrame

I was trying to search whether there would be a way to change the dtypes for the strings with numbers easily. For example, the problem I face is as follows: df = pl.DataFrame({"foo": [&...

momentlost

141

asked Apr 8, 2022 at 0:02

14 votes

2 answers

20k views

How to drop row in polars-python [closed]

How to add new feature like length of data frame & Drop rows value using indexing. I want to a add a new column where I can count the no-of rows available in a data frame, & using indexing ...

Hrushi

335

asked Mar 15, 2022 at 16:50

13 votes

4 answers

16k views

How to use polars dataframes with scikit-learn?

I'm unable to use polars dataframes with scikit-learn for ML training. Currently, I'm preprocessing all dataframes in polars and convert them to pandas for model training in order for it to work. Is ...

Regular Tech Guy

477

asked Nov 11, 2022 at 5:59

13 votes

4 answers

14k views

Make a constant column in Polars

In Polars 0.13.14, I could create a DataFrame with an all-constant column like this: import polars as pl pl.DataFrame(dict(x=pl.repeat(1, 3))) # shape: (3, 1) # ┌─────┐ # │ x │ # │ --- │ # │ i64 │ ...

drhagen

9,852

asked Mar 26, 2022 at 1:36

13 votes

3 answers

15k views

Compare two polars DataFrames for equality

How do I compare two polars DataFrames for value equality? It appears that == is only true if the two tables are the same object: import polars as pl pl.DataFrame({"x": [1,2,3]}) == pl....

drhagen

9,852

asked Feb 6, 2022 at 20:01

13 votes

2 answers

12k views

What's the polars equivalent to the pandas `.iloc` method?

I'm looking for the recommended way to select an individual row of a polars.DataFrame by row number: something largely equivalent to pandas.DataFrame's .iloc[[n]] method for a given integer n. For ...

montol

303

asked Jan 25, 2024 at 17:42

13 votes

3 answers

18k views

How to select columns by data type in Polars?

In pandas we have the pandas.DataFrame.select_dtypes method that selects certain columns depending on the dtype. Is there a similar way to do such a thing in Polars?

user554319

asked May 24, 2022 at 7:56

13 votes

2 answers

16k views

What is the recommended way for retrieving row numbers (index) for polars?

I know polars does not support index by design, so df.filter(expr).index isn't an option, another way I can think of is by adding a new column before applying any filters, not sure if this is an ...

xxx222

3,284

asked Jun 2, 2022 at 10:26

13 votes

1 answer

15k views

How can I concat polars dataframes that have different columns

In pandas it happens automatically, just by calling pd.concat([df1, df2, df3]) and the frame that didn't have the column previously just gets a column filled with NaNs. In polars I get a 'shape error' ...

zacko

417

asked Jun 16, 2022 at 8:15

13 votes

3 answers

7k views

How to do regression (simple linear for example) in polars select or groupby context?

I am using polars in place of pandas. I am quite amazed by the speed and lazy computation/evaluation. Right now, there are a lot of methods on lazy dataframe, but they can only drive me so far. So, I ...

lebesgue

1,163

asked Dec 23, 2022 at 2:39

13 votes

1 answer

20k views

Print all Columns in polars

I need to print all the columns in my file, but the result I get is this.... Do you know how I can show all the columns of my data frame? the code is this: file = pl.read_excel('1.xlsx') file = ...

V0N_fs

143

asked Aug 18, 2023 at 1:46

13 votes

3 answers

3k views

Polars for Python: How to get rid of "Ensure you pass a path to the file instead of a python file object" warning when reading to a dataframe?

The statement I'm reading data sets using Polars.read_csv() method via a Python file handler: with gzip.open(os.path.join(getParameters()['rdir'], dataset)) as compressed_file: df = pl.read_csv(...

Joris-Karl Huysmans

172

asked Mar 9, 2023 at 22:52

12 votes

3 answers

19k views

How to filter a polars dataframe by date?

df.filter(pl.col("MyDate") >= "2020-01-01") does not work like it does in pandas. I found a workaround df.filter(pl.col("MyDate") >= pl.datetime(2020,1,1)) but ...

keiv.fly

4,240

asked Feb 20, 2022 at 17:05

12 votes

3 answers

12k views

Select all columns where column name starts with string

Given the following dataframe, is there some way to select only columns starting with a given prefix? I know I could do e.g. pl.col(column) for column in df.columns if column.startswith("prefix_&...

TomNorway

3,262

asked Jul 9, 2022 at 9:10

12 votes

1 answer

3k views

Polars table convert a list column to separate rows i.e. unnest a list column to multiple rows [duplicate]

I have a Polars dataframe in the form: df = pl.DataFrame({'a':[1,2,3], 'b':[['a','b'],['a'],['c','d']]}) ┌─────┬────────────┐ │ a ┆ b │ │ --- ┆ --- │ │ i64 ┆ list[str] │ ╞═════╪═══...

kristianp

6,015

asked Jan 24, 2023 at 3:08

12 votes

0 answers

326 views

Not displaying DataFrame's name in Data Wrangler extension of VSCode, displaying "Data grid"

It is a while that I am using Data Wrangler extension in VS Code; it is very useful for analyzing datasets and filtering some columns to see the features. When I opened a dataframe in it, it used to ...

Javad Faraji

41

asked Nov 16 at 8:02

11 votes

1 answer

19k views

Apply a function to 2 columns in Polars [duplicate]

I want to apply a custom function which takes 2 columns and outputs a value based on those (row-based) In Pandas there is a syntax to apply a function based on values in multiple columns df['col_3'] = ...

Maiia Bocharova

387

asked Nov 14, 2022 at 15:16

11 votes

2 answers

7k views

convert 2 columns of polars dataframe to dictionary having its key as first column elements and second column elements as values

I am using below dataframe to convert to dictionary in specific format. However, I am getting an error TypeError: unhashable type: 'Series' import polars as pl #input (polars eager dataframe): ...

Rakesh Chaudhary

155

asked Apr 12, 2023 at 9:49

11 votes

1 answer

15k views

Apply function to all columns of a Polars-DataFrame

I know how to apply a function to all columns present in a Pandas-DataFrame. However, I have not figured out yet how to achieve this when using a Polars-DataFrame. I checked the section from the ...

Gian Arauz

456

asked Jun 4, 2021 at 9:33

11 votes

1 answer

5k views

How to properly display a Polars dataframe in VSCode Jupyter Notebook variables inspector

Edit 2 (01.08.2024): I believe VSCode has now moved onto the DataWrangler extension as their default data inspector and will deprecate the default one. https://marketplace.visualstudio.com/items?...

Raphael

1,215

asked Jan 10, 2023 at 13:16

10 votes

3 answers

4k views

Access newly created column in .with_columns() when using polars [duplicate]

I am new to Polars and I am not sure whether I am using .with_columns() correctly. Here's a situation I encounter frequently: There's a dataframe and in .with_columns(), I apply some operation to a ...

Thomas

1,351

asked Mar 1, 2023 at 9:08

Collectives™ on Stack Overflow