1,342 questions
-2
votes
0
answers
108
views
R dplyr: How to filter murders dataset by region and homicide rate and select only specific columns without triggering validation errors [closed]
I need some help in my coding related to Basic Data Wrangling. The instructions for coding in R are as follows:
Let's say you want to live in the Northeast or West in US and you want the homicide rate ...
12
votes
0
answers
326
views
Not displaying DataFrame's name in Data Wrangler extension of VSCode, displaying "Data grid"
It is a while that I am using Data Wrangler extension in VS Code; it is very useful for analyzing datasets and filtering some columns to see the features. When I opened a dataframe in it, it used to ...
1
vote
1
answer
109
views
Polars print changed values between 2 dataframes
Given two polars dataframes of the same shape, I would like to print the number of values different between the two, including missing values that are not missing in the other dataframe.
I came up ...
1
vote
3
answers
100
views
Show matched rows in polars join
When you join two tables, STATA prints the number of rows merged and unmerged.
For instance, take Example 1 at page 13 of the STATA merge doc:
use https://www.stata-press.com/data/r19/autosize
merge 1:...
-3
votes
0
answers
45
views
How can I use a named vector with recode? [duplicate]
Suppose I have
df <- data.frame(name=c("Hello", "Hi", "GoodMorning"))
I would like to convert "GoodMorning" into "GoodEvening" (of course this is ...
0
votes
2
answers
119
views
Fill but with a conditional
I have the following dataframe:
df <- tribble(
~nuts_code, ~value,
"AT", 1,
"AT1", NA,
"AT2", NA,
"BG", NA,
"BG1", 10,
"BG2"...
1
vote
4
answers
190
views
How to coalesce using a function of other rows?
I have the following tibble:
eu_df <- structure( list( nuts_code = c( "PT17", "PT17", "PT17", "PT17", "PT17", "PT17", "PT17", &...
0
votes
1
answer
82
views
Creating a column based on values of several other column in R [duplicate]
I have a dataset with 5 variables. Each variable is a name of a fruit, with 0 (don't like), and 1 (like). The data frame is like this:
set.seed(225)
fruits<-data.frame(id= seq(1:10),Apple=...
4
votes
2
answers
175
views
How to tidy messy data [closed]
I have a messy data set, which generally resembles the output of the following
schools_messy <- tibble::tribble(
~data,
"state:maryland",
"location:bowie||name:bowie state ...
1
vote
0
answers
115
views
Display Data wrangler inline in Visual code
I've been using Data wrangler in Visual code to visualize dataframe. Normally Open df in Data Wrangler will open a separate tab.
One day Data Wrangler view happened to be displayed inline like this (...
1
vote
2
answers
70
views
Flattening lists within a dataframe within a dataframe, whilst preserving names
I present here an input data frame that contains lists of data frames that contains lists.
Some of the bottom level lists are empty and some lists have length greater than one.
I am looking for some R ...
-1
votes
1
answer
128
views
In a time series, how to list all the events that have taken place since a certain latency for each focal event?
Let's say I have a time series of behaviours. It contains the timing and identity of people who have performed a particular behaviour. I want to list all the people who performed the behaviour within ...
1
vote
2
answers
94
views
Create a Grouping Column/Variable from other Columns in R
I'm trying to group data into a grouping variable based on whether or not there is data in specific columns. In other words, if there is data in the same row for V1 & V2 below, then I want to put ...
0
votes
1
answer
62
views
Expanding dataframe to include non existing values
I have a dataframe that looks like this:
Family
Order
Class
Presence
Year
Site
Location
Lat
Long
Aeshnidae
Odonata
Insecta
0
2021
KAV01
NASS
-17.4
18.5
Aeshnidae
Odonata
Insecta
0
2023
KAV01
NASS
-17....
1
vote
1
answer
42
views
Filling in a column in a dataframe using another dataframe that partially matches [duplicate]
Using R, I am trying to partially fill in a dataframe (~200 rows) using another (~170) rows by matching on an ID variable. Roughly 50% of the IDs match, and I'd like to just leave the other values ...
0
votes
0
answers
53
views
Subset dataset keeping rows with highest values in specific column [duplicate]
I have a dataframe that looks something like this:
Name
age
score
year
state
Tim
65
123
2016
KS
Tom
72
476
2016
OH
Larry
58
354
2016
NS
Dave
81
878
2017
KS
Rob
66
1123
2017
OH
Sam
32
45
2017
OH
Jeff
...
1
vote
1
answer
53
views
Running parallel models and compiling results
I would like to run iterations of a single model, substituting one of a set of 34 different response variables in each iteration, and organize the results (from summary()) of all of those models into ...
1
vote
2
answers
77
views
Reformat data to summarize and collapse rows into simple table
I have a dataset that looks something like this:
name
party
count
year
likes
retweet
Tom
R
1
2016
1357
23
Dave
R
1
2016
1881
34
Larry
D
1
2016
324
45
Tim
D
1
2016
5587
56
Rob
R
1
2016
9847
67
Sam
D
1
...
3
votes
2
answers
180
views
How to properly load .dat file in r without variable labels (NBER CPS data)
I am trying to load into r and wrangle data from the CPS (Current Population Survey) which can be downloaded at this link. There is an ostensible codebook for the information on variables and the ...
0
votes
1
answer
99
views
Not getting decimals when extracting values [duplicate]
So I am practicing data wrangling and I have encountered an issue.
food['GPA'].unique()
And the output is
array(['2.4', '3.654', '3.3', '3.2', '3.5', '2.25', '3.8', '3.904', '3.4',
'3.6', '3.1'...
0
votes
1
answer
33
views
pivot_longer() with parallel (unlinked) sets of columns [duplicate]
I'm trying to use pivot_longer() to rearrange a dataset I was given, which looks like the result of a database join operation. Here's an example of what it looks like:
dat <- tibble('Plant_Name'=c('...
1
vote
2
answers
64
views
(ERROR) Select one object and all float & int in pandas groupby
I have this dataframe.
import pandas as pd
x = {
"year": ["2012", "2012", "2013", "2014", "2012", "2014", "2013", &...
1
vote
2
answers
64
views
Regex to extract a part of URL using stringr r package
I have the following URLS:
www.google.com?utm_source=site_corriere&utm_medium=video&utm_content=box
www.google.com?utm_source=site_rep&utm_medium=display&utm_content=box
www.google.com?...
0
votes
4
answers
98
views
Fill in column based on condition with another column in R [closed]
I have the following input table:
input <- structure(
list(individual = c(1, 2, 3, 4),
age = c(20, 34, 29, 30),
earnings_2020 = c(0, 0, 1, 0),...
4
votes
3
answers
123
views
Having trouble with which.min inside dplyr pipe
I have some trouble with which.min function inside a dplyr pipe
I have a cumbersome solution (*) and I'm looking form more compact and elegant way to do this
reproducible example
library(dplyr)
...
2
votes
2
answers
74
views
Is there a R function that detects a specific string and replaces it by the value of another observation based on a number within the string?
So, I am using constituency data of the German Election 1994 and some observations contain strings that indicate that the value is given in a different row (based on the Scheme "siehe Wkr xxx&...
0
votes
1
answer
39
views
Advanced pivot_longer transformation sequentially on a group of columns
I'm a little perplex concerning the exact way to proceed with this wrangling procedure.
I've a dataset which consist in raters that are assessing lung sounds (S1,...,S40). For each sound the assessed ...
3
votes
1
answer
86
views
Behavior of %>% when piping values to functions containing pipes
The below examples demonstrate that passing an object to deparse() and substitute() produces different output depending on whether the object is passed to the function with %>% and whether the ...
0
votes
2
answers
86
views
Reformatting pdf text into dataframe to remove extra information [closed]
I am trying to load the text from a pdf into R for text analysis. The pdf is formatted so that the text has columns for extra information. Please see the screen shot below.
I'd like to load the main ...
0
votes
1
answer
104
views
How to Rearrange Values in Each Row to Avoid Duplicates Across Columns in R?
Question
I have a data frame in R where each row contains multiple columns with categorical values. My goal is to rearrange the values within each row so that no value is repeated across columns in ...
1
vote
1
answer
142
views
restrict to those with data at specific age ranges in R
I have the following long format data frame with columns, id, age, and BMI. I have restricted the dataset such that only people (id) with at least 3 repeated measurements between age 2 weeks and 10 ...
-3
votes
1
answer
46
views
More elegant solution for conditional filtering? [closed]
The code below works perfectly fine and outputs the data of interest. However, I am wondering if there is a better solution or different way think about the logic.
Essentially, I need filter for the ...
0
votes
1
answer
41
views
Can't Open .xlsx Document
I tried to download a .xlsx file from my course. But when I opened the .xlsx file, it turned into something like this.
UEsDBBQABgAIAAAAIQBBN4LPbgEAAAQFAAATAAgCW0NvbnRlbnRfVHlwZXNdLnhtbCCiBAIooAAC
...
0
votes
1
answer
59
views
How to Add New Column in Dictionary?
Based on the data below, I want to calculate the BMI Index for each row and the average for the total row. The BMI Index formula is 'berat' / 'tinggi'.
enter image description here
data = [{'nama': '...
0
votes
1
answer
47
views
How to lengthen data in one column separated by semicolons, and repeat elements from the other column?
I have received a dataset in a .csv table. The first three lines of the table looks like this:
Species,Methods
Chlamydomonas pisiformis; Stichococcus bacillaris; Stichococcus subtilis; Pleurococcus ...
0
votes
0
answers
39
views
for deep learning: save each sample individually or keep blocks? data doesnt fit memory
I am training a classifier. My data comes from multiple datasets, each dataset contains multiple subjects, each subject has performed multiple trials. Currently my data structure on disk looks like ...
1
vote
1
answer
61
views
Creating a large number of columns in R tidyverse based on a comparison with a specific column
I have a dataset in R tidyverse and I want to create 192 columns based on comparison with the sp column, just like the mp_comp_1 column. How can I do this for 192 columns in tidyverse?
library(...
1
vote
2
answers
79
views
Pattern matching in a dataframe
I am having some trouble conducting pattern matching within a data frame. I am working with grepl function in R.
I have a data frame of 5 local districts in two years (2001 and 2002). I want to check ...
2
votes
3
answers
101
views
Complete and fill missing rows with groups of uneven length
I have a dataframe of county executives and the year they were inaugurated. I am running a panel study with county-year as the unit of analysis. The date range is 2000 to 2004.
I will like to expand ...
-1
votes
3
answers
253
views
Remove duplicate rows, keep first row [duplicate]
I am working with a dataframe on county executives. I want to run a panel study where the unit of analysis is the county-year.
The problem is that sometimes two or more county executives serve during ...
-1
votes
1
answer
42
views
Fill in missing rows
I have a data frame of county executives and the year they were inaugurated.
I am runnig a panel study with county-year as the unit of analyis. The date range is 2000 to 2004.
I will like to expand ...
1
vote
2
answers
67
views
dataframe breakdown by year
I have a dataset on county executives and their year of inaguration. I need break down which year each executive was inaugurated.
The problem is that the notation under the "year" variable ...
1
vote
3
answers
151
views
Add values across dataframe columns
I have a dataframe where missingness in indicated by "Z" (there may also be some "z" and NA entries present in the data), and values are entered as characters ("0", "...
1
vote
3
answers
50
views
Drop columns that are replicated in a data frame
I have a large data frame with repeated variables. This is just a sample of my data to illustrate the question:
df <- data.frame(
ID = rep(1:4, each = 1),
CMW = rep(c(10, 20, 30, 30), each = 1),...
-1
votes
1
answer
55
views
I need some help creating a loop/automatic way of cleaning my data [duplicate]
I'm quite new to programmin language and I am starting with R in my research predicting dengue desease cases with climatic data.
I'm still cleaning my data to work with and this particular one has ...
0
votes
1
answer
50
views
Add Column to R Data Frame from Another Data Frame with Matching Index Column, Only When Values are in A Certain Range
I am trying to add a column to a data frame (df1) from another data frame (df2), but only when the "depth range" from df1 lies within the "depth range" from df2. I'll explain below ...
0
votes
1
answer
66
views
SQL data wrangling help using the Having statement
The below code (Databricks SQL) produces the table following it. I am trying to adjust this code so that the output only includes zip5 records that have only 1 (or less) of each facility_type ...
1
vote
1
answer
51
views
Join tables based on a range instead of exact match [duplicate]
I have two datasets as the ones described below:
dfA <- tibble(
name = c("John", "Michael", "Brian", "Thomas", "Peter"),
expected = c(128.34, ...
0
votes
0
answers
76
views
How can I load data in Rstudio but making it accessible in other computers when opening the file?
I'm working on an assignment and we were asked to load the data and make the file run without errors when opening from the teacher's computer. He said: "When writing your code, keep the data ...
0
votes
2
answers
115
views
R: Alternatives/approaches to read_html() + html_text() that also work on strings without HTML/XML tags
In this solution to removing HTML tags from a string, the string is passed to rvest::read_html() to create an html_document object and then the object is passed to rvest::html_text() to return "...