2

I need to recode values over multiple columns of a data frame based on another table.

I have to recode the values of multiple columns of a data table using a side table. The values correspond to geographic identifiers that I must replace with place names. So I decided to do a loop but what works outside the loop doesn't work anymore . I can't use mutate in for loop.

My real data contains 274 columns with 38 columns to recode. This columns have many different names (they aren't call places")

my main dataset :

 id <- c(1, 2, 3)
 departure <- c(1, 2, NA)
 arrival <- c(3, 1, 2)
 transit <- c(NA,NA,1)
dataset <- data.frame(id, departure, arrival, transit)

The other table :

geo_id <- c(1, 2, 3)
place_name <- c("Paris", "Nantes", "London")
geocode <- data.frame(geo_id, place_name)

My loop :

var <- c("departure", "arrival", "transit") #the columns that should by recode (must be a vector with my  real data)

for (i in var) {
  print(i)
  dataset <- dataset %>% 
  mutate(i = geocode$place_name[match(i, geocode$geo_id)])

}

mutate create a new column call i ! How to avoid this ?

4
  • Have you tried mutate_at? This looks like what it's designed for Commented Jan 5, 2020 at 17:00
  • @camille 1 not working ! Commented Jan 5, 2020 at 17:03
  • Not working how? Commented Jan 5, 2020 at 17:04
  • @camille 1 Error in check_dot_cols(.vars, .cols) : l'argument ".vars" est manquant, avec aucune valeur par défaut Commented Jan 5, 2020 at 17:04

4 Answers 4

4

With dplyr, you can do:

dataset %>%
 mutate_at(vars(one_of(var)), ~ geocode$place_name[match(., geocode$geo_id)])

  id place1 place2 place3
1  1  Paris London   <NA>
2  2 Nantes  Paris   <NA>
3  3   <NA> Nantes  Paris

Or with the addition of tidyr:

dataset %>%
 pivot_longer(one_of(var)) %>%
 left_join(geocode, by = c("value" = "geo_id")) %>%
 select(-value) %>%
 pivot_wider(names_from = name, values_from = place_name)
Sign up to request clarification or add additional context in comments.

6 Comments

In my real dataset, variables don't begin by place. I have a vector with varables names...
Updated it accordingly :)
In my real dataset, I have 274 colmuns, colmuns to recode and other columns are not ordered.
It is applying it only on columns in your vector. Isn't this what you are interested in?
Then both possibilities should work. What is the issue?
|
1

I think you want to join the datasets. You can use this dplyr function and drop any unneeded columns.

comb <- dplyr::left_join(dataset, geocode, by = (c("id" = "geo_id")))
comb

  id place1 place2 place3 place_name
1  1      1      3     NA      Paris
2  2      2      1     NA     Nantes
3  3     NA      2      1     London

1 Comment

I need to replace id values by names over 38 columns in reality
0

Here's one way to do:

# select cols to recode
cols <- c('place1','place2')

# get other columns
other_cols <- setdiff(colnames(dataset), cols)

# recode df
recode_df = sapply(cols, function(x) place_name[dataset[[x]]])

# get all columns together
df = cbind(recode_df, dataset[other_cols])

Comments

0

Maybe there are simpler ways but the code below works and if the var vector of variables to change is preprocessed as one regex pattern, this code seems to be general, not depending on the number or names of the columns.

Part of it is inspired in this answer to another question. The auxiliary function f is taken from there.

library(dplyr)
library(tidyr)

var_pattern <- paste(var, collapse = "|")

f <- function(.) if(length(unique(.[!is.na(.)])) > 1L) . else first(.[!is.na(.)]) 

dataset %>%
  gather(place, value, -id) %>%
  mutate(place_name = geocode$place_name[value]) %>%
  spread(place, place_name) %>%
  select(-value) %>%
  group_by(id) %>%
  mutate_at(vars(matches(var_pattern)), f) %>%
  ungroup() %>%
  distinct() %>% 
  filter(rowSums(is.na(.)) < 2L) 
## A tibble: 3 x 4
#     id place1 place2 place3
#  <dbl> <fct>  <fct>  <fct> 
#1     1 Paris  London NA    
#2     2 Nantes Paris  NA    
#3     3 NA     Nantes Paris 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.