Skip to main content
Filter by
Sorted by
Tagged with
48 votes
2 answers
43k views

What is the equivalent of drop_duplicates() from pandas in polars? import polars as pl df = pl.DataFrame({"a":[1,1,2], "b":[2,2,3], "c":[1,2,3]}) df Output: shape: (...
keiv.fly's user avatar
  • 4,240
30 votes
5 answers
30k views

I cannot find how to reorder columns in a polars dataframe in the polars DataFrame docs.
rchitect-of-info's user avatar
28 votes
2 answers
28k views

I am currently creating a new column in a polars data frame using predictions = [10, 20, 30, 40, 50] df['predictions'] = predictions where predictions is a numpy array or list containing values I ...
Felix.B's user avatar
  • 736
26 votes
1 answer
51k views

Consider a Polars data frame with a column of str type that indicates the date in the format '27 July 2020'. I would like to convert this column to the polars.datetime type, which is distinct from the ...
fabioklr's user avatar
  • 670
25 votes
2 answers
24k views

I have scrubbed the polars docs and cannot see an example of creating a column with a fixed value from a variable. Here is what works in pandas: df['VERSION'] = version Thx
rchitect-of-info's user avatar
25 votes
2 answers
34k views

How can I achieve the equivalents of SQL's IN and NOT IN? I have a list with the required values. Here's the scenario: import pandas as pd import polars as pl exclude_fruit = ["apple", "...
Daycent's user avatar
  • 615
25 votes
6 answers
29k views

I have a Polars DataFrame with a list column. I want to control how many elements of a pl.List column are printed. I've tried pl.pl.Config.set_fmt_str_lengths() but this only restricts the number of ...
braaannigan's user avatar
23 votes
3 answers
30k views

I am trying to loop through a Polars recordset using the following code: import polars as pl df = pl.DataFrame({ "start_date": ["2020-01-02", "2020-01-03", "...
John Smith's user avatar
  • 2,956
23 votes
1 answer
38k views

The CSV file I have is 70 Gb in size. I want to load the DF and count the number of rows, in lazy mode. What's the best way to do so? As far as I can tell, there is no function like shape in lazy mode ...
roei shlezinger's user avatar
22 votes
1 answer
24k views

I am breaking my head trying to figure out how to use group_by and apply a custom function using Polars. Coming from Pandas, I was using: import polars as pl import pandas as pd from scipy.stats ...
jbssm's user avatar
  • 7,181
20 votes
2 answers
55k views

Being a new user to polars coming from pandas, I have searched polars GitHub pages, user guide, stackoverflow and discord channel on how to add a new column to a polars dataframe. I have only found ...
eoia's user avatar
  • 203
20 votes
1 answer
27k views

If I have a Polars literal, how can I extract the value? import polars as pl expr = pl.lit(0.5) val = float(expr) # TypeError: float() argument must be a string or a real number, not 'Expr'
drhagen's user avatar
  • 9,852
19 votes
5 answers
19k views

I wonder how i can transform Spark dataframe to Polars dataframe. Let's say i have this code on PySpark: df = spark.sql('''select * from tmp''') I can easily transform it to pandas dataframe using ....
s1nbad's user avatar
  • 193
18 votes
6 answers
19k views

Is there an elegant way how to recode values in polars dataframe. For example 1->0, 2->0, 3->1... in Pandas it is simple like that: df.replace([1,2,3,4,97,98,99],[0,0,1,1,2,2,2])
zenelb's user avatar
  • 201
18 votes
2 answers
30k views

In Polars, how can one specify a single dtype for all columns in read_csv? According to the docs, the schema_overrides argument to read_csv can take either a mapping (dict) in the form of {'...
daviewales's user avatar
  • 2,889
18 votes
6 answers
2k views

Suppose my data looks like this: data = { 'value': [1,9,6,7,3, 2,4,5,1,9] } For each row, I would like to find the row number of the latest previous element larger than the current one. So, my ...
ignoring_gravity's user avatar
18 votes
1 answer
21k views

I want to pass each row of a Polars DataFrame into a custom function. def my_complicated_function(row): # ... return result df = pl.DataFrame({ "foo": [1, 2, 3], &...
Pradeepgb's user avatar
  • 181
18 votes
5 answers
9k views

I went through the entire documentation of Polars but couldn't find anything which could convert nested json into dataframe. test = { "name": "Ravi", "Subjects": { ...
Shikha Sheoran's user avatar
17 votes
2 answers
33k views

I see it's possible to append using the series namespace (https://stackoverflow.com/a/70599059/5363883). What I'm wondering is if there is a similar method for appending or concatenating DataFrames. ...
cnpryer's user avatar
  • 405
17 votes
2 answers
19k views

Was wondering about the size of particular polars DataFrames. I tried with: from sys import getsizeof getsizeof(df) Out[17]: 48 getsizeof(df.to_pandas()) Out[18]: 1602923950 It appears all polars df ...
fvg's user avatar
  • 273
17 votes
2 answers
21k views

I have a polars DataFrame with a number of Series that look like: pl.Series(['cow', 'cat', '', 'lobster', '']) # Series: '' [str] # [ # "cow" # "cat" # "" # &...
user avatar
16 votes
7 answers
20k views

In pandas, the following code will split the string from col1 into many columns. is there a way to do this in polars? data = {"col1": ["a/b/c/d", "a/b/c/d"]} df = pl....
user avatar
16 votes
3 answers
22k views

I have this dataframe import polars as pl df = pl.from_repr(""" ┌─────┬───────┐ │ one ┆ two │ │ --- ┆ --- │ │ str ┆ str │ ╞═════╪═══════╡ │ a ┆ hola │ │ b ┆ world │ └─────┴──...
user18140022's user avatar
16 votes
1 answer
11k views

In pandas, we can just assign directly: import pandas as pd import polars as pl df = pl.DataFrame({"a": [1, 2]}) df_pd = df.to_pandas() # add a single value df_pd["b"] = 3 # ...
lemmingxuan's user avatar
16 votes
3 answers
14k views

I would like to replace Pandas with Polars but I was not able to find out how to use Polars with Plotly without converting to Pandas. I wonder if there is a way to completely cut Pandas out of the ...
fabioklr's user avatar
  • 670
15 votes
5 answers
15k views

In Pandas we can use the map function to map a dict to a series to create another series with the mapped values. More generally speaking, I believe it invokes the index operator of the argument, i.e. [...
T.H Rice's user avatar
  • 317
15 votes
2 answers
12k views

Currently when I try to retrieve date from a polars datetime column, I have to write something similar to: import polars as pl import datetime as dt df = pl.DataFrame({ 'time': [dt.datetime.now()]...
Alex's user avatar
  • 609
15 votes
4 answers
9k views

I'm looking for a function along the lines of df.group_by('column').agg(sample(10)) so that I can take ten or so randomly-selected elements from each group. This is specifically so I can read in a ...
user avatar
15 votes
2 answers
10k views

Is there any built-in function in polars or a better way to convert time durations to numeric by defining the time resolution (e.g.: days, hours, minutes)? import polars as pl df = pl.DataFrame({ ...
Guz's user avatar
  • 477
14 votes
3 answers
30k views

I was trying to search whether there would be a way to change the dtypes for the strings with numbers easily. For example, the problem I face is as follows: df = pl.DataFrame({"foo": [&...
momentlost's user avatar
14 votes
2 answers
20k views

How to add new feature like length of data frame & Drop rows value using indexing. I want to a add a new column where I can count the no-of rows available in a data frame, & using indexing ...
Hrushi's user avatar
  • 335
13 votes
4 answers
16k views

I'm unable to use polars dataframes with scikit-learn for ML training. Currently, I'm preprocessing all dataframes in polars and convert them to pandas for model training in order for it to work. Is ...
Regular Tech Guy's user avatar
13 votes
4 answers
14k views

In Polars 0.13.14, I could create a DataFrame with an all-constant column like this: import polars as pl pl.DataFrame(dict(x=pl.repeat(1, 3))) # shape: (3, 1) # ┌─────┐ # │ x │ # │ --- │ # │ i64 │ ...
drhagen's user avatar
  • 9,852
13 votes
3 answers
15k views

How do I compare two polars DataFrames for value equality? It appears that == is only true if the two tables are the same object: import polars as pl pl.DataFrame({"x": [1,2,3]}) == pl....
drhagen's user avatar
  • 9,852
13 votes
2 answers
12k views

I'm looking for the recommended way to select an individual row of a polars.DataFrame by row number: something largely equivalent to pandas.DataFrame's .iloc[[n]] method for a given integer n. For ...
montol's user avatar
  • 303
13 votes
3 answers
18k views

In pandas we have the pandas.DataFrame.select_dtypes method that selects certain columns depending on the dtype. Is there a similar way to do such a thing in Polars?
user avatar
13 votes
2 answers
16k views

I know polars does not support index by design, so df.filter(expr).index isn't an option, another way I can think of is by adding a new column before applying any filters, not sure if this is an ...
xxx222's user avatar
  • 3,284
13 votes
1 answer
15k views

In pandas it happens automatically, just by calling pd.concat([df1, df2, df3]) and the frame that didn't have the column previously just gets a column filled with NaNs. In polars I get a 'shape error' ...
zacko's user avatar
  • 417
13 votes
3 answers
7k views

I am using polars in place of pandas. I am quite amazed by the speed and lazy computation/evaluation. Right now, there are a lot of methods on lazy dataframe, but they can only drive me so far. So, I ...
lebesgue's user avatar
  • 1,163
13 votes
1 answer
20k views

I need to print all the columns in my file, but the result I get is this.... Do you know how I can show all the columns of my data frame? the code is this: file = pl.read_excel('1.xlsx') file = ...
V0N_fs's user avatar
  • 143
13 votes
3 answers
3k views

The statement I'm reading data sets using Polars.read_csv() method via a Python file handler: with gzip.open(os.path.join(getParameters()['rdir'], dataset)) as compressed_file: df = pl.read_csv(...
Joris-Karl Huysmans's user avatar
12 votes
3 answers
19k views

df.filter(pl.col("MyDate") >= "2020-01-01") does not work like it does in pandas. I found a workaround df.filter(pl.col("MyDate") >= pl.datetime(2020,1,1)) but ...
keiv.fly's user avatar
  • 4,240
12 votes
3 answers
12k views

Given the following dataframe, is there some way to select only columns starting with a given prefix? I know I could do e.g. pl.col(column) for column in df.columns if column.startswith("prefix_&...
TomNorway's user avatar
  • 3,262
12 votes
1 answer
3k views

I have a Polars dataframe in the form: df = pl.DataFrame({'a':[1,2,3], 'b':[['a','b'],['a'],['c','d']]}) ┌─────┬────────────┐ │ a ┆ b │ │ --- ┆ --- │ │ i64 ┆ list[str] │ ╞═════╪═══...
kristianp's user avatar
  • 6,015
12 votes
0 answers
326 views

It is a while that I am using Data Wrangler extension in VS Code; it is very useful for analyzing datasets and filtering some columns to see the features. When I opened a dataframe in it, it used to ...
Javad Faraji's user avatar
11 votes
1 answer
19k views

I want to apply a custom function which takes 2 columns and outputs a value based on those (row-based) In Pandas there is a syntax to apply a function based on values in multiple columns df['col_3'] = ...
Maiia Bocharova's user avatar
11 votes
2 answers
7k views

I am using below dataframe to convert to dictionary in specific format. However, I am getting an error TypeError: unhashable type: 'Series' import polars as pl #input (polars eager dataframe): ...
Rakesh Chaudhary's user avatar
11 votes
1 answer
15k views

I know how to apply a function to all columns present in a Pandas-DataFrame. However, I have not figured out yet how to achieve this when using a Polars-DataFrame. I checked the section from the ...
Gian Arauz's user avatar
11 votes
1 answer
5k views

Edit 2 (01.08.2024): I believe VSCode has now moved onto the DataWrangler extension as their default data inspector and will deprecate the default one. https://marketplace.visualstudio.com/items?...
Raphael's user avatar
  • 1,215
10 votes
3 answers
4k views

I am new to Polars and I am not sure whether I am using .with_columns() correctly. Here's a situation I encounter frequently: There's a dataframe and in .with_columns(), I apply some operation to a ...
Thomas's user avatar
  • 1,351

1
2 3 4 5
57