Skip to main content
Filter by
Sorted by
Tagged with
1 vote
3 answers
63 views

I have a dataFrame with columns Age, Salary and others, if I used: df['Age'] = df['Age'].apply(lambda x : x+100 if x>30 else 0) Then I can modify the Age column with the if else condition. Also, if ...
Edy's user avatar
  • 23
0 votes
1 answer
82 views

I am new to R and trying to create two new variables from my dataset. My data frame is called netflix and it contains these relevant columns: date_added and duration Example values: date_added: "...
user31963479's user avatar
1 vote
0 answers
41 views

I am trying to explain() a Polars query to see which operations can be executed using the streaming engine. Currently, I am only able to do this using show_graph(). From sources on the web, I see that ...
gaut's user avatar
  • 6,038
Advice
1 vote
2 replies
77 views

Is there a simple way (i.e., not involoving writing a print() method) to show the timezone when printing a data frame with a POSIXct column ? as.data.frame(Sys.time()) # Sys.time() # 1 2025-...
Thomas's user avatar
  • 553
0 votes
1 answer
83 views

I noticed that groupby().apply() produces different results for two groups that look identical, except that the overall DataFrame has duplicate index values. Here is a minimal reproducible example: ...
Bhumika Aggarwal's user avatar
1 vote
1 answer
59 views

I have string column in polars dataframe with multiple datetime formats and I am using following code to convert datatype of column from string into datetime. import polars as pl df = pl.from_dict({'...
dikesh's user avatar
  • 3,135
0 votes
0 answers
70 views

I have a Python file import polars as pl import requests from pathlib import Path url = "https://raw.githubusercontent.com/leanhdung1994/files/main/processedStep1_enwiktionary_namespace_0_43....
Akira's user avatar
  • 2,820
0 votes
1 answer
151 views

I have a number of data.frames, with names apple, banana, and coffee. I want to create, and then export, new dataframes in a for-loop corresponding to each one, call them apple_new, banana_new, and ...
s84492025's user avatar
2 votes
4 answers
149 views

I got a dataframe df1 which looks like this: Column1 Column2 13 1 12 1 15 0 16 0 15 1 14 1 12 1 11 0 21 1 45 1 44 0 The 1s indicate that a measurement started, I don't know how many 1s will be in one ...
toben aus's user avatar
1 vote
3 answers
162 views

I have a .ndjson files with millions of rows. Each row has a field html which contains html strings. I would like to write all such html into a .txt file. One html is into one line of the .txt file. I ...
Akira's user avatar
  • 2,820
5 votes
2 answers
106 views

Consider the following pandas Series with a DatatimeIndex of daily values (using day-of-year as an example): import pandas as pd dti = pd.date_range("2017-11-02", "2019-05-21", ...
Mike T's user avatar
  • 44.4k
2 votes
1 answer
131 views

I am looking for the nearest non exact match on the dates column: import polars as pl df = pl.from_repr(""" ┌─────┬────────────┐ │ uid ┆ dates │ │ --- ┆ --- │ │ i64 ┆ date ...
rainerpf's user avatar
Tooling
0 votes
2 replies
67 views

I would like to know how to export or import TOON (Token object oriented notation) in pandas.
Rainb's user avatar
  • 2,567
-2 votes
1 answer
80 views

I have a dictionary of polars.DataFrames called data_dict. All dataframes inside the dict values are having an extra index column ''. I want to drop that column and set a new column named 'name_ID' ...
Tudi72's user avatar
  • 31
3 votes
2 answers
211 views

I have a DataFrame with a column Digit of digits at base 10. For example import numpy as np import pandas as pd df = pd.DataFrame({ "Digit": [ 1, 3, 5, 7, 0, 0, 0, 4, 8, ...
Max Pierini's user avatar
  • 2,323
2 votes
1 answer
73 views

I am working on a dashboard using Shiny for Python and Plotly Express. I am trying to create a Gantt chart (using px.timeline) to visualize the operating periods of different boilers (ON/OFF states). ...
Juan Siécola's user avatar
2 votes
1 answer
76 views

Imagine having the following polars dataframe "df" that contains the temperature of a machine that is either "active" or "inactive": import polars as pl from datetime ...
the_economist's user avatar
1 vote
0 answers
76 views

I have a table that looks like this import polars as pl df = pl.DataFrame( { "col1": [1, 2, 3, 4, 5], "col2": [10, 20, 30, 40, 50], "col3": [...
Lethnis's user avatar
  • 31
-4 votes
0 answers
75 views

I am trying add one pandas DataFrame to another DataFrame. How can I do this in the style of list.append? usernames = {"anvar":"anvar123", "behruz":"Bex124", &...
Ravshanjon Ahmadjonov's user avatar
1 vote
2 answers
139 views

I have one dataframe with sessions - one session, one row, so SID is unique. The session has a doctor name. SID Doctor Patient 1 robby david 2 langdon sara 3 langdon michael I have another dataframe ...
Semyaz's user avatar
  • 11
12 votes
0 answers
326 views

It is a while that I am using Data Wrangler extension in VS Code; it is very useful for analyzing datasets and filtering some columns to see the features. When I opened a dataframe in it, it used to ...
Javad Faraji's user avatar
0 votes
0 answers
39 views

I have a table/df that holds a set of code and value pairs. The codes are a mix of old (legacy) and new codes due to process changes. I have a second table/df that holds the old codes, new codes, ...
MikeB2019x's user avatar
  • 1,297
1 vote
1 answer
109 views

Given two polars dataframes of the same shape, I would like to print the number of values different between the two, including missing values that are not missing in the other dataframe. I came up ...
robertspierre's user avatar
2 votes
2 answers
91 views

I have a CSV of energy consumption data over time (each month for several years). I want to determine the percentage (decimal portion) for each month across that year; e.g., August was 12.3% of the ...
Buckley's user avatar
  • 151
-3 votes
1 answer
97 views

I am trying to display the headers of a data frame I created based on a csv file using the PythonAnywhere free version. I keep getting a huge error message and I don't understand what I did wrong. ...
user31868639's user avatar
Best practices
0 votes
3 replies
99 views

What is the right/pythonic way to do math on a few indexed elements in a Pandas Dataframe? I tried a few ways but they seem awkward and confusing: df = pd.DataFrame({'x': [1, 2, 3, 4, 5, 6, 7, 9, ]}) ...
Dave X's user avatar
  • 5,247
1 vote
2 answers
107 views

Excel is adding a unicode character to a summary file I save from r as a .csv, where it adds "¬" in front of "±". Is there a way to edit the r script to prevent this? cola <- c(&...
Mulligan's user avatar
1 vote
3 answers
100 views

When you join two tables, STATA prints the number of rows merged and unmerged. For instance, take Example 1 at page 13 of the STATA merge doc: use https://www.stata-press.com/data/r19/autosize merge 1:...
robertspierre's user avatar
1 vote
1 answer
65 views

I need to take an integer input value, assign it to a variable, and then use that variable as a column name to get data from a pandas DataFrame. Data: 1 10 20 30 2 40 50 60 Steps: Assign input to ...
madhu chatim's user avatar
1 vote
0 answers
64 views

I was reading the pandas documentation: pandas.DataFrame.apply documentation In this definition, it seems to me like the use of axis is the exact opposite of what it means in other functions? To apply ...
Jacoberu's user avatar
0 votes
0 answers
118 views

I'm new to Python, so please be lenient. I want to read the value of a single cell from a dataframe based on the selected row. I do this as below, but I get the value not when I click on a record (...
Szymon Tomtała's user avatar
1 vote
2 answers
153 views

I am using Python with a pandas dataframe, it is a CSV of Steam games, and I have the categorical columns of publishers, developers, categories, genres, and tags, but categories, genres, and tags are ...
Luciano Elish's user avatar
0 votes
2 answers
126 views

I'm working with a data frame where the color of an object (red or green) was recorded in ordinal classes corresponding to percent coverage. I am looking to replace all the classes with their ...
Wren's user avatar
  • 41
3 votes
0 answers
146 views

I noticed a significant performance deterioration when using polars dataframe join function after upgrading polars from 1.30.0 to 1.31.0. The code snippet is below: import polars as pl import time ...
Y. Gao's user avatar
  • 1,049
3 votes
2 answers
134 views

Having this kind of pandas dataframe df = pd.DataFrame({ 'ts_diff':[0, 0, 738, 20, 29, 61, 42, 18, 62, 41, 42, 0, 0, 729, 43, 59, 42, 61, 44, 36, 61, 61, 42, 18, 62, 41, 42, 0, 0] }) ts_diff - is ...
ihtus's user avatar
  • 2,863
0 votes
0 answers
70 views

Assume I have the following Excel Sheet: Location Mar2000 London 1234567891011 Tokyo 12345667897 These are the raw values saved in a CSV format e.g. my_data.csv (assume it is CSV not UTF-8 format). ...
Beans On Toast's user avatar
3 votes
3 answers
193 views

Hopefully the title is reasonably intuitive, edits welcome. Say I have this dataframe: df = pd.DataFrame({'x': ['A', 'B', 'B', 'C', 'C', 'C', 'D', 'D'], 'y': [None, None, 1, 2, 3, 4,...
Hendy's user avatar
  • 10.7k
0 votes
0 answers
41 views

I am reading in some .xpt data using the haven package. When I view the dataframe in RStudio, it appears as in the snapshot below (showing only a small number of the columns): There are actually 16 ...
please help's user avatar
1 vote
3 answers
159 views

I'd like to replace any value greater than some condition with zero for any column except the date column in a df. The closest I've found it df.with_columns( pl.when(pl.any_horizontal(pl.col(pl....
thefrollickingnerd's user avatar
2 votes
1 answer
129 views

I have two Polars DataFrames (df1 and df2) with the same columns. I want to compare them by ID and Iname, and get the rows where any of the other columns (X, Y, Z) differ between the two. import ...
Simon's user avatar
  • 1,209
2 votes
1 answer
121 views

I have a dataframes pulled from a file. The variable with all these dataframe names is: Data_Tables. These dataframes all have the same columns, and I want to concatenate the dataframes based on the ...
Jon S's user avatar
  • 55
0 votes
0 answers
163 views

I'm working with a large Polars LazyFrame and computing rolling aggregations grouped by customer (Cusid). I need to find the "front" of the rolling window (last Tts_date) for each group to ...
Liisjak's user avatar
  • 37
2 votes
4 answers
128 views

I am still a beginner in python. I am trying to find a common value with if statement, import pandas as pd df = pd.read_csv("data.csv") for n in range(2, len(df)): if df.loc[n].isin([2]...
Kan's user avatar
  • 41
0 votes
1 answer
105 views

In a Jupyter Notebook, I make use of a Jupyter Widget to interact with a function. The widget gives me a dropdown that can cycle through some plots, and its options are retrieved from a dataframe. ...
sybren osinga's user avatar
1 vote
0 answers
78 views

There is a dataframe with a multiindex columns: import pandas as pd df = pd.DataFrame({ "A": ["foo", "foo", "bar", "bar"], "B": [&...
VictorS's user avatar
  • 11
5 votes
3 answers
168 views

I have below pandas dataframe import pandas as pd data = pd.DataFrame({'x1':range(10, 18), # Create pandas DataFrame 'x2':['a', 'b', 'b', 'c', 'd', 'a', 'b', 'd'], ...
Bogaso's user avatar
  • 3,896
6 votes
1 answer
106 views

I want to calculate the mean over some group column 'a' but include only one value per second group column 'b'. Constraints: I want to preserve all original records in the result. (if possible) avoid ...
gogodigi's user avatar
4 votes
3 answers
106 views

I would like to code a logger for polars using the Custom Namespace API. For instance, starting from: import logging import polars as pl penguins_pl = pl.read_csv("https://raw.githubusercontent....
robertspierre's user avatar
0 votes
1 answer
125 views

I want to add a seasonal_factor column to my D1 data frame. The seasonal factor is from another source, in matrix format, with 4 matrices per year from 2021 to 2024. I get errors on matching the same ...
Amy Z's user avatar
  • 11
4 votes
4 answers
178 views

I have a Polars DataFrame with a column named "*" and would like to reference just that column. When I try to use pl.col("*") it is interpreted as a wildcard for "all columns.&...
Sam's user avatar
  • 359

1
2 3 4 5
2974