Newest 'dataframe' Questions

1 vote

3 answers

63 views

How to modify mulitple columns applying if else to multiple pandas dataframe columns

I have a dataFrame with columns Age, Salary and others, if I used: df['Age'] = df['Age'].apply(lambda x : x+100 if x>30 else 0) Then I can modify the Age column with the if else condition. Also, if ...

Edy

23

asked 22 hours ago

0 votes

1 answer

82 views

How to create 2 new columns in R (date difference + convert Seasons to minutes)? [duplicate]

I am new to R and trying to create two new variables from my dataset. My data frame is called netflix and it contains these relevant columns: date_added and duration Example values: date_added: "...

user31963479

1

asked yesterday

1 vote

0 answers

41 views

How to show the streaming parts of a polars query using explain()?

I am trying to explain() a Polars query to see which operations can be executed using the streaming engine. Currently, I am only able to do this using show_graph(). From sources on the web, I see that ...

gaut

6,038

asked yesterday

Advice

1 vote

2 replies

77 views

Show the timezone when printing a data frame with a `POSIXct` column

Is there a simple way (i.e., not involoving writing a print() method) to show the timezone when printing a data frame with a POSIXct column ? as.data.frame(Sys.time()) # Sys.time() # 1 2025-...

Thomas

553

asked Nov 26 at 17:39

0 votes

1 answer

83 views

Why does groupby().apply() produce inconsistent results on identical groups when the DataFrame has overlapping indices?

I noticed that groupby().apply() produces different results for two groups that look identical, except that the overall DataFrame has duplicate index values. Here is a minimal reproducible example: ...

Bhumika Aggarwal

1

asked Nov 26 at 14:31

1 vote

1 answer

59 views

Polars parse multiple datetime format [duplicate]

I have string column in polars dataframe with multiple datetime formats and I am using following code to convert datatype of column from string into datetime. import polars as pl df = pl.from_dict({'...

dikesh

3,135

asked Nov 26 at 12:27

0 votes

0 answers

70 views

polars.LazyFrame.sink_csv does not give CRLF line termination [duplicate]

I have a Python file import polars as pl import requests from pathlib import Path url = "https://raw.githubusercontent.com/leanhdung1994/files/main/processedStep1_enwiktionary_namespace_0_43....

Akira

2,820

asked Nov 25 at 19:19

0 votes

1 answer

151 views

Construct a simple loop to create new data frames in R [closed]

I have a number of data.frames, with names apple, banana, and coffee. I want to create, and then export, new dataframes in a for-loop corresponding to each one, call them apple_new, banana_new, and ...

s84492025

11

asked Nov 25 at 16:47

2 votes

4 answers

149 views

How to split dataframe into multiple sub-dataframes based on column value

I got a dataframe df1 which looks like this: Column1 Column2 13 1 12 1 15 0 16 0 15 1 14 1 12 1 11 0 21 1 45 1 44 0 The 1s indicate that a measurement started, I don't know how many 1s will be in one ...

toben aus

43

asked Nov 25 at 15:33

1 vote

3 answers

162 views

Polars: how to write a column of strings into a txt file without escaping?

I have a .ndjson files with millions of rows. Each row has a field html which contains html strings. I would like to write all such html into a .txt file. One html is into one line of the .txt file. I ...

Akira

2,820

asked Nov 25 at 0:08

5 votes

2 answers

106 views

How to resample timeseries with origin aligned to start of year

Consider the following pandas Series with a DatatimeIndex of daily values (using day-of-year as an example): import pandas as pd dti = pd.date_range("2017-11-02", "2019-05-21", ...

Mike T

44.4k

asked Nov 24 at 1:18

2 votes

1 answer

131 views

Why does a nearest join_asof() return exact matches despite allow_exact_matches=False?

I am looking for the nearest non exact match on the dates column: import polars as pl df = pl.from_repr(""" ┌─────┬────────────┐ │ uid ┆ dates │ │ --- ┆ --- │ │ i64 ┆ date ...

rainerpf

21

asked Nov 21 at 20:58

Tooling

0 votes

2 replies

67 views

How to export or import TOON in pandas?

I would like to know how to export or import TOON (Token object oriented notation) in pandas.

Rainb

2,567

asked Nov 20 at 10:33

-2 votes

1 answer

80 views

polars.exceptions.DuplicateError: column with name 'name_ID' has more than one occurrence [closed]

I have a dictionary of polars.DataFrames called data_dict. All dataframes inside the dict values are having an extra index column ''. I want to drop that column and set a new column named 'name_ID' ...

Tudi72

31

asked Nov 19 at 16:08

3 votes

2 answers

211 views

Efficiently get first indices of consecutive identical digits in big pandas DataFrames

I have a DataFrame with a column Digit of digits at base 10. For example import numpy as np import pandas as pd df = pd.DataFrame({ "Digit": [ 1, 3, 5, 7, 0, 0, 0, 4, 8, ...

Max Pierini

2,323

asked Nov 17 at 21:31

2 votes

1 answer

73 views

problem on the x-axis of the graph, doesn't render the time

I am working on a dashboard using Shiny for Python and Plotly Express. I am trying to create a Gantt chart (using px.timeline) to visualize the operating periods of different boilers (ON/OFF states). ...

Juan Siécola

97

asked Nov 17 at 17:18

2 votes

1 answer

76 views

Change color of single line in altair line chart based on other indicator column

Imagine having the following polars dataframe "df" that contains the temperature of a machine that is either "active" or "inactive": import polars as pl from datetime ...

the_economist

579

asked Nov 17 at 9:32

1 vote

0 answers

76 views

Is it possible to drop/select columns where col.n_unique > 1 with native polars syntax [duplicate]

I have a table that looks like this import polars as pl df = pl.DataFrame( { "col1": [1, 2, 3, 4, 5], "col2": [10, 20, 30, 40, 50], "col3": [...

Lethnis

31

asked Nov 17 at 2:07

-4 votes

0 answers

75 views

How to combine two pandas DataFrames [duplicate]

I am trying add one pandas DataFrame to another DataFrame. How can I do this in the style of list.append? usernames = {"anvar":"anvar123", "behruz":"Bex124", &...

Ravshanjon Ahmadjonov

1

asked Nov 16 at 16:55

1 vote

2 answers

139 views

How to get a true/false without duplicates when comparing two Pandas dataframes?

I have one dataframe with sessions - one session, one row, so SID is unique. The session has a doctor name. SID Doctor Patient 1 robby david 2 langdon sara 3 langdon michael I have another dataframe ...

Semyaz

11

asked Nov 16 at 9:29

12 votes

0 answers

326 views

Not displaying DataFrame's name in Data Wrangler extension of VSCode, displaying "Data grid"

It is a while that I am using Data Wrangler extension in VS Code; it is very useful for analyzing datasets and filtering some columns to see the features. When I opened a dataframe in it, it used to ...

Javad Faraji

41

asked Nov 16 at 8:02

0 votes

0 answers

39 views

Pandas merge on one of two criteria [duplicate]

I have a table/df that holds a set of code and value pairs. The codes are a mix of old (legacy) and new codes due to process changes. I have a second table/df that holds the old codes, new codes, ...

MikeB2019x

1,297

asked Nov 14 at 16:28

1 vote

1 answer

109 views

Polars print changed values between 2 dataframes

Given two polars dataframes of the same shape, I would like to print the number of values different between the two, including missing values that are not missing in the other dataframe. I came up ...

robertspierre

5,386

asked Nov 13 at 16:52

2 votes

2 answers

91 views

Seeking more efficient method in Python & Polars to perform monthly comparison within each year

I have a CSV of energy consumption data over time (each month for several years). I want to determine the percentage (decimal portion) for each month across that year; e.g., August was 12.3% of the ...

Buckley

151

asked Nov 13 at 16:26

-3 votes

1 answer

97 views

create dataframe from csv in PythonAnywhere [closed]

I am trying to display the headers of a data frame I created based on a csv file using the PythonAnywhere free version. I keep getting a huge error message and I don't understand what I did wrong. ...

user31868639

13

asked Nov 13 at 3:45

Best practices

0 votes

3 replies

99 views

How to access specific, indexed elements of a Pandas Dataframe for math?

What is the right/pythonic way to do math on a few indexed elements in a Pandas Dataframe? I tried a few ways but they seem awkward and confusing: df = pd.DataFrame({'x': [1, 2, 3, 4, 5, 6, 7, 9, ]}) ...

Dave X

5,247

asked Nov 12 at 15:00

1 vote

2 answers

107 views

Excel adding unicode symbol from r csv output

Excel is adding a unicode character to a summary file I save from r as a .csv, where it adds "¬" in front of "±". Is there a way to edit the r script to prevent this? cola <- c(&...

Mulligan

97

asked Nov 11 at 23:30

1 vote

3 answers

100 views

Show matched rows in polars join

When you join two tables, STATA prints the number of rows merged and unmerged. For instance, take Example 1 at page 13 of the STATA merge doc: use https://www.stata-press.com/data/r19/autosize merge 1:...

robertspierre

5,386

asked Nov 11 at 15:20

1 vote

1 answer

65 views

How to assign int input value as column name in pandas [duplicate]

I need to take an integer input value, assign it to a variable, and then use that variable as a column name to get data from a pandas DataFrame. Data: 1 10 20 30 2 40 50 60 Steps: Assign input to ...

madhu chatim

11

asked Nov 10 at 17:51

1 vote

0 answers

64 views

Why does DataFrame.apply() use axis=1 for rows instead of axis=0 in pandas? [duplicate]

I was reading the pandas documentation: pandas.DataFrame.apply documentation In this definition, it seems to me like the use of axis is the exact opposite of what it means in other functions? To apply ...

Jacoberu

11

asked Nov 10 at 16:57

0 votes

0 answers

118 views

How to get value of cell in dataframe

I'm new to Python, so please be lenient. I want to read the value of a single cell from a dataframe based on the selected row. I do this as below, but I get the value not when I click on a record (...

Szymon Tomtała

1

asked Nov 8 at 21:34

1 vote

2 answers

153 views

After encoding my categorical columns in a pandas dataframe, I was left with too many columns. How can I drop some?

I am using Python with a pandas dataframe, it is a CSV of Steam games, and I have the categorical columns of publishers, developers, categories, genres, and tags, but categories, genres, and tags are ...

Luciano Elish

33

asked Nov 7 at 18:23

0 votes

2 answers

126 views

Replace values in multiple columns in a data frame based on conditions in another data frame? [duplicate]

I'm working with a data frame where the color of an object (red or green) was recorded in ordinal classes corresponding to percent coverage. I am looking to replace all the classes with their ...

Wren

41

asked Nov 7 at 17:53

3 votes

0 answers

146 views

Why polars join function performance deteriorates so much from version 1.30.0 to 1.31.0?

I noticed a significant performance deterioration when using polars dataframe join function after upgrading polars from 1.30.0 to 1.31.0. The code snippet is below: import polars as pl import time ...

Y. Gao

1,049

asked Nov 7 at 13:14

3 votes

2 answers

134 views

Calculate cumulative value based on another column [duplicate]

Having this kind of pandas dataframe df = pd.DataFrame({ 'ts_diff':[0, 0, 738, 20, 29, 61, 42, 18, 62, 41, 42, 0, 0, 729, 43, 59, 42, 61, 44, 36, 61, 61, 42, 18, 62, 41, 42, 0, 0] }) ts_diff - is ...

ihtus

2,863

asked Nov 6 at 14:56

0 votes

0 answers

70 views

Reading in values from CSV and making sure they are non-scientific format in R? [duplicate]

Assume I have the following Excel Sheet: Location Mar2000 London 1234567891011 Tokyo 12345667897 These are the raw values saved in a CSV format e.g. my_data.csv (assume it is CSV not UTF-8 format). ...

Beans On Toast

1,131

asked Nov 6 at 12:52

3 votes

3 answers

193 views

Filter a pandas df: per group, keep only non-null rows if we have them, else keep a single null row

Hopefully the title is reasonably intuitive, edits welcome. Say I have this dataframe: df = pd.DataFrame({'x': ['A', 'B', 'B', 'C', 'C', 'C', 'D', 'D'], 'y': [None, None, 1, 2, 3, 4,...

Hendy

10.7k

asked Nov 5 at 21:20

0 votes

0 answers

41 views

Extracting numeric component of some of the column entries [duplicate]

I am reading in some .xpt data using the haven package. When I view the dataframe in RStudio, it appears as in the snapshot below (showing only a small number of the columns): There are actually 16 ...

please help

245

asked Nov 5 at 18:05

1 vote

3 answers

159 views

Replace value by condition across entire polars df

I'd like to replace any value greater than some condition with zero for any column except the date column in a df. The closest I've found it df.with_columns( pl.when(pl.any_horizontal(pl.col(pl....

thefrollickingnerd

400

asked Nov 5 at 0:26

2 votes

1 answer

129 views

Find differing rows between two Polars DataFrames based on ID and multiple columns

I have two Polars DataFrames (df1 and df2) with the same columns. I want to compare them by ID and Iname, and get the rows where any of the other columns (X, Y, Z) differ between the two. import ...

Simon

1,209

asked Nov 4 at 19:06

2 votes

1 answer

121 views

Concatenate Tables Based on Column Information in Python [duplicate]

I have a dataframes pulled from a file. The variable with all these dataframe names is: Data_Tables. These dataframes all have the same columns, and I want to concatenate the dataframes based on the ...

Jon S

55

asked Nov 4 at 16:49

0 votes

0 answers

163 views

How to efficiently get the last row of a rolling aggregation group without .last()?

I'm working with a large Polars LazyFrame and computing rolling aggregations grouped by customer (Cusid). I need to find the "front" of the rolling window (last Tts_date) for each group to ...

Liisjak

37

asked Nov 4 at 16:13

2 votes

4 answers

128 views

How to find a common value using if statement

I am still a beginner in python. I am trying to find a common value with if statement, import pandas as pd df = pd.read_csv("data.csv") for n in range(2, len(df)): if df.loc[n].isin([2]...

Kan

41

asked Nov 4 at 14:55

0 votes

1 answer

105 views

Slicing pandas dataframe with a value from a Jupyter widget raises an error

In a Jupyter Notebook, I make use of a Jupyter Widget to interact with a function. The widget gives me a dropdown that can cycle through some plots, and its options are retrieved from a dataframe. ...

sybren osinga

9

asked Nov 3 at 23:17

1 vote

0 answers

78 views

Why does the pivoted dataframe contain information about columns that weren't included in the pivot? [duplicate]

There is a dataframe with a multiindex columns: import pandas as pd df = pd.DataFrame({ "A": ["foo", "foo", "bar", "bar"], "B": [&...

VictorS

11

asked Nov 3 at 9:55

5 votes

3 answers

168 views

Slice a pandas dataframe at specific index points

I have below pandas dataframe import pandas as pd data = pd.DataFrame({'x1':range(10, 18), # Create pandas DataFrame 'x2':['a', 'b', 'b', 'c', 'd', 'a', 'b', 'd'], ...

Bogaso

3,896

asked Nov 3 at 8:08

6 votes

1 answer

106 views

Polars streaming: How to compute a nested window aggregation while avoiding in-memory-maps?

I want to calculate the mean over some group column 'a' but include only one value per second group column 'b'. Constraints: I want to preserve all original records in the result. (if possible) avoid ...

gogodigi

95

asked Oct 31 at 11:16

4 votes

3 answers

106 views

Extending polars DataFrame while maintaining variables between calls

I would like to code a logger for polars using the Custom Namespace API. For instance, starting from: import logging import polars as pl penguins_pl = pl.read_csv("https://raw.githubusercontent....

robertspierre

5,386

asked Oct 31 at 9:19

0 votes

1 answer

125 views

How to add a new column in a dataframe matching the matrix or multiple matrices by date variables

I want to add a seasonal_factor column to my D1 data frame. The seasonal factor is from another source, in matrix format, with 4 matrices per year from 2021 to 2024. I get errors on matching the same ...

Amy Z

11

asked Oct 30 at 14:06

4 votes

4 answers

178 views

Reference column named "*" in Polars

I have a Polars DataFrame with a column named "*" and would like to reference just that column. When I try to use pl.col("*") it is interpreted as a wildcard for "all columns.&...

Sam

359

asked Oct 29 at 21:56

Collectives™ on Stack Overflow