2,817 questions
48
votes
2
answers
43k
views
What is the equivalent of `DataFrame.drop_duplicates()` from pandas in polars?
What is the equivalent of drop_duplicates() from pandas in polars?
import polars as pl
df = pl.DataFrame({"a":[1,1,2], "b":[2,2,3], "c":[1,2,3]})
df
Output:
shape: (...
30
votes
5
answers
30k
views
Polars: How to reorder columns in a specific order?
I cannot find how to reorder columns in a polars dataframe in the polars DataFrame docs.
28
votes
2
answers
28k
views
How to add a column to a polars DataFrame using .with_columns() [duplicate]
I am currently creating a new column in a polars data frame using
predictions = [10, 20, 30, 40, 50]
df['predictions'] = predictions
where predictions is a numpy array or list containing values I ...
26
votes
1
answer
51k
views
Easily convert string column to pl.datetime in Polars
Consider a Polars data frame with a column of str type that indicates the date in the format '27 July 2020'.
I would like to convert this column to the polars.datetime type, which is distinct from the ...
25
votes
2
answers
24k
views
Polars: Create column with fixed value from variable [duplicate]
I have scrubbed the polars docs and cannot see an example of creating a column with a fixed value from a variable. Here is what works in pandas:
df['VERSION'] = version
Thx
25
votes
2
answers
34k
views
Polars: How to filter using 'in' and 'not in' like in SQL
How can I achieve the equivalents of SQL's IN and NOT IN?
I have a list with the required values. Here's the scenario:
import pandas as pd
import polars as pl
exclude_fruit = ["apple", "...
25
votes
6
answers
29k
views
In Polars how do I print all elements of a list column?
I have a Polars DataFrame with a list column. I want to control how many elements of a pl.List column are printed.
I've tried pl.pl.Config.set_fmt_str_lengths() but this only restricts the number of ...
23
votes
3
answers
30k
views
Polars looping through the rows in a dataset
I am trying to loop through a Polars recordset using the following code:
import polars as pl
df = pl.DataFrame({
"start_date": ["2020-01-02", "2020-01-03", "...
23
votes
1
answer
38k
views
Python Polars: How to get the row count of a LazyFrame?
The CSV file I have is 70 Gb in size. I want to load the DF and count the number of rows, in lazy mode. What's the best way to do so?
As far as I can tell, there is no function like shape in lazy mode ...
22
votes
1
answer
24k
views
How to use group_by and apply a custom function with Polars?
I am breaking my head trying to figure out how to use group_by and apply a custom function using Polars.
Coming from Pandas, I was using:
import polars as pl
import pandas as pd
from scipy.stats ...
20
votes
2
answers
55k
views
Add a single string value as a new column to polars DataFrame [duplicate]
Being a new user to polars coming from pandas, I have searched polars GitHub pages, user guide, stackoverflow and discord channel on how to add a new column to a polars dataframe.
I have only found ...
20
votes
1
answer
27k
views
Extract value of Polars literal
If I have a Polars literal, how can I extract the value?
import polars as pl
expr = pl.lit(0.5)
val = float(expr)
# TypeError: float() argument must be a string or a real number, not 'Expr'
19
votes
5
answers
19k
views
How to transform Spark dataframe to Polars dataframe?
I wonder how i can transform Spark dataframe to Polars dataframe.
Let's say i have this code on PySpark:
df = spark.sql('''select * from tmp''')
I can easily transform it to pandas dataframe using ....
18
votes
6
answers
19k
views
Pandas REPLACE equivalent in Polars
Is there an elegant way how to recode values in polars dataframe.
For example
1->0,
2->0,
3->1...
in Pandas it is simple like that:
df.replace([1,2,3,4,97,98,99],[0,0,1,1,2,2,2])
18
votes
2
answers
30k
views
Polars: Specify dtypes for all columns at once in read_csv
In Polars, how can one specify a single dtype for all columns in read_csv?
According to the docs, the schema_overrides argument to read_csv can take either a mapping (dict) in the form of {'...
18
votes
6
answers
2k
views
pandas or Polars: find index of previous element larger than current one
Suppose my data looks like this:
data = {
'value': [1,9,6,7,3, 2,4,5,1,9]
}
For each row, I would like to find the row number of the latest previous element larger than the current one.
So, my ...
18
votes
1
answer
21k
views
How to apply a custom function in Polars that does the processing row by row?
I want to pass each row of a Polars DataFrame into a custom function.
def my_complicated_function(row):
# ...
return result
df = pl.DataFrame({
"foo": [1, 2, 3],
&...
18
votes
5
answers
9k
views
Is there a json_normalize like feature in Polars?
I went through the entire documentation of Polars but couldn't find anything which could convert nested json into dataframe.
test = {
"name": "Ravi",
"Subjects": {
...
17
votes
2
answers
33k
views
How can I append or concatenate two dataframes in python polars?
I see it's possible to append using the series namespace (https://stackoverflow.com/a/70599059/5363883). What I'm wondering is if there is a similar method for appending or concatenating DataFrames.
...
17
votes
2
answers
19k
views
Polars DataFrame memory size in Python
Was wondering about the size of particular polars DataFrames.
I tried with:
from sys import getsizeof
getsizeof(df)
Out[17]: 48
getsizeof(df.to_pandas())
Out[18]: 1602923950
It appears all polars df ...
17
votes
2
answers
21k
views
Idiomatic replacement of empty string with null in Polars
I have a polars DataFrame with a number of Series that look like:
pl.Series(['cow', 'cat', '', 'lobster', ''])
# Series: '' [str]
# [
# "cow"
# "cat"
# ""
# &...
16
votes
7
answers
20k
views
Split a string column into many columns by delimiter in Polars
In pandas, the following code will split the string from col1 into many columns. is there a way to do this in polars?
data = {"col1": ["a/b/c/d", "a/b/c/d"]}
df = pl....
16
votes
3
answers
22k
views
Polars: change a value in a dataframe if a condition is met in another column
I have this dataframe
import polars as pl
df = pl.from_repr("""
┌─────┬───────┐
│ one ┆ two │
│ --- ┆ --- │
│ str ┆ str │
╞═════╪═══════╡
│ a ┆ hola │
│ b ┆ world │
└─────┴──...
16
votes
1
answer
11k
views
Add a new Polars column from a single value?
In pandas, we can just assign directly:
import pandas as pd
import polars as pl
df = pl.DataFrame({"a": [1, 2]})
df_pd = df.to_pandas()
# add a single value
df_pd["b"] = 3
# ...
16
votes
3
answers
14k
views
How to use Polars with Plotly without converting to Pandas?
I would like to replace Pandas with Polars but I was not able to find out how to use Polars with Plotly without converting to Pandas. I wonder if there is a way to completely cut Pandas out of the ...
15
votes
5
answers
15k
views
Mapping a Python dict to a Polars series
In Pandas we can use the map function to map a dict to a series to create another series with the mapped values. More generally speaking, I believe it invokes the index operator of the argument, i.e. [...
15
votes
2
answers
12k
views
Retrieve date from datetime column in polars
Currently when I try to retrieve date from a polars datetime column, I have to write something similar to:
import polars as pl
import datetime as dt
df = pl.DataFrame({
'time': [dt.datetime.now()]...
15
votes
4
answers
9k
views
Sample from each group in polars dataframe?
I'm looking for a function along the lines of
df.group_by('column').agg(sample(10))
so that I can take ten or so randomly-selected elements from each group.
This is specifically so I can read in a ...
15
votes
2
answers
10k
views
How to convert time durations to numeric in polars?
Is there any built-in function in polars or a better way to convert time durations to numeric by defining the time resolution (e.g.: days, hours, minutes)?
import polars as pl
df = pl.DataFrame({
...
14
votes
3
answers
30k
views
Switching between dtypes within a DataFrame
I was trying to search whether there would be a way to change the dtypes for the strings with numbers easily. For example, the problem I face is as follows:
df = pl.DataFrame({"foo":
[&...
14
votes
2
answers
20k
views
How to drop row in polars-python [closed]
How to add new feature like length of data frame & Drop rows value using indexing.
I want to a add a new column where I can count the no-of rows available in a data frame,
& using indexing ...
13
votes
4
answers
16k
views
How to use polars dataframes with scikit-learn?
I'm unable to use polars dataframes with scikit-learn for ML training.
Currently, I'm preprocessing all dataframes in polars and convert them to pandas for model training in order for it to work.
Is ...
13
votes
4
answers
14k
views
Make a constant column in Polars
In Polars 0.13.14, I could create a DataFrame with an all-constant column like this:
import polars as pl
pl.DataFrame(dict(x=pl.repeat(1, 3)))
# shape: (3, 1)
# ┌─────┐
# │ x │
# │ --- │
# │ i64 │
...
13
votes
3
answers
15k
views
Compare two polars DataFrames for equality
How do I compare two polars DataFrames for value equality? It appears that == is only true if the two tables are the same object:
import polars as pl
pl.DataFrame({"x": [1,2,3]}) == pl....
13
votes
2
answers
12k
views
What's the polars equivalent to the pandas `.iloc` method?
I'm looking for the recommended way to select an individual row of a polars.DataFrame by row number: something largely equivalent to pandas.DataFrame's .iloc[[n]] method for a given integer n.
For ...
13
votes
3
answers
18k
views
How to select columns by data type in Polars?
In pandas we have the pandas.DataFrame.select_dtypes method that selects certain columns depending on the dtype. Is there a similar way to do such a thing in Polars?
13
votes
2
answers
16k
views
What is the recommended way for retrieving row numbers (index) for polars?
I know polars does not support index by design, so df.filter(expr).index isn't an option, another way I can think of is by adding a new column before applying any filters, not sure if this is an ...
13
votes
1
answer
15k
views
How can I concat polars dataframes that have different columns
In pandas it happens automatically, just by calling pd.concat([df1, df2, df3]) and the frame that didn't have the column previously just gets a column filled with NaNs.
In polars I get a 'shape error' ...
13
votes
3
answers
7k
views
How to do regression (simple linear for example) in polars select or groupby context?
I am using polars in place of pandas. I am quite amazed by the speed and lazy computation/evaluation. Right now, there are a lot of methods on lazy dataframe, but they can only drive me so far.
So, I ...
13
votes
1
answer
20k
views
Print all Columns in polars
I need to print all the columns in my file, but the result I get is this....
Do you know how I can show all the columns of my data frame?
the code is this:
file = pl.read_excel('1.xlsx')
file = ...
13
votes
3
answers
3k
views
Polars for Python: How to get rid of "Ensure you pass a path to the file instead of a python file object" warning when reading to a dataframe?
The statement
I'm reading data sets using Polars.read_csv() method via a Python file handler:
with gzip.open(os.path.join(getParameters()['rdir'], dataset)) as compressed_file:
df = pl.read_csv(...
12
votes
3
answers
19k
views
How to filter a polars dataframe by date?
df.filter(pl.col("MyDate") >= "2020-01-01")
does not work like it does in pandas.
I found a workaround
df.filter(pl.col("MyDate") >= pl.datetime(2020,1,1))
but ...
12
votes
3
answers
12k
views
Select all columns where column name starts with string
Given the following dataframe, is there some way to select only columns starting with a given prefix? I know I could do e.g. pl.col(column) for column in df.columns if column.startswith("prefix_&...
12
votes
1
answer
3k
views
Polars table convert a list column to separate rows i.e. unnest a list column to multiple rows [duplicate]
I have a Polars dataframe in the form:
df = pl.DataFrame({'a':[1,2,3], 'b':[['a','b'],['a'],['c','d']]})
┌─────┬────────────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ list[str] │
╞═════╪═══...
12
votes
0
answers
326
views
Not displaying DataFrame's name in Data Wrangler extension of VSCode, displaying "Data grid"
It is a while that I am using Data Wrangler extension in VS Code; it is very useful for analyzing datasets and filtering some columns to see the features. When I opened a dataframe in it, it used to ...
11
votes
1
answer
19k
views
Apply a function to 2 columns in Polars [duplicate]
I want to apply a custom function which takes 2 columns and outputs a value based on those (row-based)
In Pandas there is a syntax to apply a function based on values in multiple columns
df['col_3'] = ...
11
votes
2
answers
7k
views
convert 2 columns of polars dataframe to dictionary having its key as first column elements and second column elements as values
I am using below dataframe to convert to dictionary in specific format.
However, I am getting an error TypeError: unhashable type: 'Series'
import polars as pl
#input (polars eager dataframe):
...
11
votes
1
answer
15k
views
Apply function to all columns of a Polars-DataFrame
I know how to apply a function to all columns present in a Pandas-DataFrame. However, I have not figured out yet how to achieve this when using a Polars-DataFrame.
I checked the section from the ...
11
votes
1
answer
5k
views
How to properly display a Polars dataframe in VSCode Jupyter Notebook variables inspector
Edit 2 (01.08.2024):
I believe VSCode has now moved onto the DataWrangler extension as their default data inspector and will deprecate the default one.
https://marketplace.visualstudio.com/items?...
10
votes
3
answers
4k
views
Access newly created column in .with_columns() when using polars [duplicate]
I am new to Polars and I am not sure whether I am using .with_columns() correctly.
Here's a situation I encounter frequently:
There's a dataframe and in .with_columns(), I apply some operation to a ...