Skip to main content

Questions tagged [pandas]

Pandas is a Python data analysis library.

Filter by
Sorted by
Tagged with
4 votes
3 answers
84 views

I have a custom object which stores dataframes in memory given a certain hierarchy, and I want to store this data in a file while maintaining the hierarchy. This hierarchy involved parents, children, ...
Marcus Carpenter's user avatar
8 votes
3 answers
491 views

I have a comma-separated value (CSV) file as input, and I am supposed to interpolate all missing (nan) values based on neighboring non-diagonal values. The CSV ...
con's user avatar
  • 361
4 votes
2 answers
353 views

I want to calculate the quarterly average of a time-indexed dataframe column in a rolling fashion. The mean at any timestamp should not contain information about future timestamps. This is a code to ...
shamalaia's user avatar
  • 316
3 votes
3 answers
140 views

I have the following code to amend two rows of "test_base.csv" with the entries of the arrays "a_temp" and "b_temp," saving the result into "result.csv." .csv ...
Zachary's user avatar
  • 33
4 votes
1 answer
214 views

I'm trying to build a function that identifies those who are promoted into a list of jobcodes, or are promoted within that list of jobcodes. Initially I was using ...
Gage's user avatar
  • 41
0 votes
1 answer
123 views

I have a dataset that contains 750,000 rows. I want to query each row and get the postcodes using the latitudes and longitudes. Problem: The code is executing very fast when I query like 100 rows, and ...
Buchi's user avatar
  • 1
1 vote
1 answer
105 views

This post is modified from this one: https://codereview.stackexchange.com/posts/292885/edit (Alternatives to iterrow loops in python pandas dataframes). I have a piece of code to calculate price ...
Laura's user avatar
  • 81
6 votes
2 answers
748 views

I have a piece of code to calculate price sensitivity based on the product and its rating. Below is the original data set with product type, reported year, customer’s rating, price per unit, and ...
Laura's user avatar
  • 81
2 votes
1 answer
58 views

I'm trying to capture profits and set a stop loss in my trading strategy. I want the stop loss to be set daily based on the past data and if the current price, i.e., price for the date falls below the ...
driver's user avatar
  • 222
2 votes
1 answer
253 views

I'm looking to understand if my code has an obvious blockage or performance pain point that will cause it to operate slower or use more memory than it should. The current Excelfile i am processing ...
sayth's user avatar
  • 131
3 votes
1 answer
296 views

I have the following data: ...
mahmoud988's user avatar
1 vote
1 answer
149 views

I am performing a sports prediction multi-class classification problem, and wanted to compare the differences in model performance between normalised and non-normalised data. You can see the 2 ...
pastybake2002's user avatar
3 votes
1 answer
245 views

I am trying to solve a multi-class classification involving prediction the outcome of a football match (target variable = Win, Lose or Draw). With a dataset of 2280 rows, which is 6 seasons of ...
pastybake2002's user avatar
3 votes
1 answer
89 views

Looking for a better approach to write below transformation using Python. Is it possible to avoid loop and still achieve the desired output? It is too slow for 10 million rows. ...
user278818's user avatar
6 votes
2 answers
131 views

I am trying to build a useable NLP corpus but getting bottlenecked by how long the program takes (200 hours). With so much data I know that optimizing my code even a little bit will net me huge time ...
evader110's user avatar
  • 163
2 votes
1 answer
90 views

I've developed a Python script that simulates die rolls and analyses the results. I'm now looking to extend and modify this code for more complex data science tasks and simulations. Is this code ...
Attila Vajda's user avatar
3 votes
3 answers
203 views

Update: Okay, after trying to use this for a while, I think it's probably a bad idea. Please use (lambda x: x["a"] + x["b"])(df) if really ...
user1537366's user avatar
0 votes
2 answers
161 views

I wrote this code to indicate duplicated values. It actually works but I hope to know if there's another possible solution to optimize this process. Thanks. ...
peternish's user avatar
2 votes
1 answer
101 views

I am a junior data engineer that have 3 years of experience with Python. I write a lot of Python code for my job and I came up with this question I can't solve by my own. I don't have the chance to ...
Izem's user avatar
  • 21
2 votes
1 answer
75 views

I have a DataFrame (database_df) that contains the general record with the IDs that are the same team in each of the lines, containing these values I need to find ...
Digital Farmer's user avatar
1 vote
2 answers
97 views

My question got rejected the last time so I am trying a better approach to getting a solution: ...
PyNoob's user avatar
  • 21
2 votes
1 answer
210 views

I have created a LP function to help maximize a set of features. My first time playing with this library and also conducting LP. Variables: Number of features => X Number of Categories => Y ...
Kale 's user avatar
  • 23
2 votes
1 answer
74 views

I currently have the following python code that adds a few calculated columns to my consol file. Essentially it combines all the sales files into one combined DF and then adds 4 new sales columns with ...
Neo's user avatar
  • 21
1 vote
1 answer
159 views

As you'll see from the below code, I'm creating separate data frames of a much larger data frame, then updating a column for each one. What I'm doing is looking at the second column and checking to ...
jp207's user avatar
  • 173
1 vote
2 answers
106 views

In Python using Pandas, I am splitting a dataset column into 4 lists based on the suffix of the values. For the 3 suffixes I am using a list comprehension then for the 4th one, a set operation that ...
evilmandarine's user avatar
2 votes
1 answer
92 views

Background: I'm a BI developer building a new dashboard for a client. They want to track performance for the week/month/year to date against the prior period. Unfortunately, I don't have direct access ...
DixieFlatline's user avatar
2 votes
1 answer
177 views

I'm new to python and pandas. I would like to use pandas groupby() to flag values in a df that are outliers. I think I've got it working, but as I'm new to python, ...
Quentin's user avatar
  • 123
2 votes
1 answer
173 views

I need to apply the coint function from the statsmodels library to 207 times series with 1397 points each, two by two. Currently, it takes between 35-40 minutes on my computer with an Intel 24 Cores ...
Begoodpy's user avatar
  • 135
1 vote
1 answer
405 views

I read data from Excel into a Pandas DataFrame, so that every column represents a different variable, and every row represents a different sample. I made the function below to identify potential ...
Pimsel's user avatar
  • 25
7 votes
2 answers
632 views

I have a csv file that looks like this: ...
Fang's user avatar
  • 617
1 vote
1 answer
420 views

Pandas likes to throw cryptic errors when you feed its functions with empty DataFrames saying nothing that would help you to identify the root cause. In order to ...
t3chb0t's user avatar
  • 44.7k
2 votes
2 answers
2k views

I've got something really simple this time where I'm mapping pandas' Series to dataclasses with a oneliner helper function (as ...
t3chb0t's user avatar
  • 44.7k
1 vote
2 answers
221 views

The problem: I am given a data frame. Somewhere in that dataframe there is 3*N number of columns that I need to modify based on a condition. The columns of interest look like this: names_1 address_1 ...
Glue's user avatar
  • 129
5 votes
1 answer
219 views

not sure if this project fits on code review, but my code is getting extremely messy, and would love some tips to clean it up! Overview The project is designed to take in an HTML file (a degree audit),...
retep's user avatar
  • 189
1 vote
1 answer
100 views

I am trying to help students visualize the central limit theorem and wanted to do this with simulated data. I created a population dataset with three variables: ...
Damon C. Roberts's user avatar
1 vote
1 answer
389 views

I build my code studying this question: "Divide total sum equally to higher sampled time periods when upsampling with pandas". I am wondering if can be improved the code and if it is right. ...
Andrea Ciufo's user avatar
1 vote
1 answer
111 views

I found myself many times in the past trying to generate fake DataFrames in pandas. I decided just for fun, to write a script that I can specify some inputs and ...
Tasos's user avatar
  • 159
4 votes
2 answers
239 views

The following code tries to convert an unstructured TOC with bounding box layout data given by the output of pdftotext -bbox-layout -f 11 -l 13 new_book.pdf toc.html...
Sati's user avatar
  • 427
4 votes
1 answer
1k views

I'm using BeautifulSoup to parse a bunch of combined tables' rows, row by row, column by column to prepare it for import into Pandas. I can't use to_html() because ...
Meghan M.'s user avatar
  • 141
1 vote
1 answer
200 views

i am looking for an efficient way to read and append texts of .txt files to a dataframe. I currently have 10 folders with 100k documents each. What i specifically need to do is: getting the names of ...
Piergiorgio Di Pasquale's user avatar
1 vote
1 answer
121 views

I have the following dataframe: ...
illuminato's user avatar
1 vote
1 answer
100 views

Each of the lines in my CSV is a possibility of investment that I register on historic, but I would only make the investment if in the existing history (previous lines) the sum of the results is above ...
Digital Farmer's user avatar
2 votes
1 answer
84 views

I would like a review regarding the method I use to create the new columns and then reposition them in the correct place where they should be. The new column called ...
Digital Farmer's user avatar
-4 votes
1 answer
49 views

The problem is Find the names of all characters which are from the same homeworld as Chewbacca My code is ...
su sahin's user avatar
5 votes
1 answer
203 views

I've written a parser to scrape data from Canadian Statistics Bureau. ...
alphamu's user avatar
  • 153
2 votes
1 answer
194 views

I am new to python. I am trying to get the total number of failures by checking first how did the transition of the column Failure Sensor. Then creating the Start column from devicetimestamp if the ...
Noobie1997's user avatar
3 votes
1 answer
83 views

I am cleaning a dataset where columns lat and long are presenting some values multiplied by 10. Not only 10, but changing 10^n. I wrote the code below. I am not sure if it is the best way, but is ...
GregOliveira's user avatar
1 vote
0 answers
57 views

The dataset I'm working with is rather large so I've been experimenting with cudf and cupy. Here you can find instructions for ...
Jason Leaver's user avatar
2 votes
1 answer
335 views

I wrote a function to download a large zipfile 5-7gb from Iowa State MRMS data archive. The zip files appear to be malformed and results in a BadZipFileError hence ...
Jason Leaver's user avatar

1
2 3 4 5
13