Join data frames faster

Question

I'm currently merging 12 different data frames that are each 480,00 obs by an id and adding the columns, so it becomes a 48k obs x 14 variable data frame. However, this is taking too long to process and I'm looking for a faster way to do this.

Example

dput:

# January data
jan <- structure(list(gridNumber = c("17578", "18982", "18983", "18984", 
"18985"), PRISM_ppt_stable_4kmM2_193301_bil = c(35.7099990844727, 
36, 35.4199981689453, 33.7299995422363, 33.2799987792969)), .Names = c("gridNumber", 
"PRISM_ppt_stable_4kmM2_193301_bil"), row.names = c("17578", 
"18982", "18983", "18984", "18985"), class = "data.frame")

# February data 
feb <- structure(list(gridNumber = c("17578", "18982", "18983", "18984", 
"18985"), PRISM_ppt_stable_4kmM2_193302_bil = c(14.6199998855591, 
14.5600004196167, 14.9899997711182, 15.4700002670288, 15.5799999237061
)), .Names = c("gridNumber", "PRISM_ppt_stable_4kmM2_193302_bil"
), row.names = c("17578", "18982", "18983", "18984", "18985"), class = "data.frame")

# March Data 
mar <- structure(list(gridNumber = c("17578", "18982", "18983", "18984", 
"18985"), PRISM_ppt_stable_4kmM2_193303_bil = c(23.8400001525879, 
23.9200000762939, 24.3400001525879, 25.7900009155273, 26.5900001525879
)), .Names = c("gridNumber", "PRISM_ppt_stable_4kmM2_193303_bil"
), row.names = c("17578", "18982", "18983", "18984", "18985"), class = "data.frame")

dplyr Code:

  library(dplyr)
  datalist <- list(jan, feb, mar)
  full <- Reduce(function(x,y) {full_join(x,y, by = "gridNumber")}, datalist)

This code obviously runs much faster because of the low obs, but is there a faster way to do this?

mnel · Accepted Answer · 2015-10-09 02:46:26Z

3

Here is an approach using data.table and reshape2

library(data.table)
library(reshape2)
# create a list of data frames, and coerce to data.tables
month_list <- lapply(list(jan,feb,mar),setDT)


# add id column with old variable name and rename value column 
for(i in seq_along(month_list)){
  set(month_list[[i]],j="ID",value = names(month_list[[i]])[2])
  setnames(month_list[[i]],  names(month_list[[i]])[2], "value")


}
# put in long form
long_data <- rbindlist(month_list)

# then use `dcast.data.table` to make wide

wide <- dcast.data.table(long_data, gridNumber~ID, value = 'value')

answered Oct 9, 2015 at 2:46

mnel

116k28 gold badges269 silver badges255 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Vedda Over a year ago

This is really fast; although, I don't understand the middle for loop. Why is this a necessary step?

mnel Over a year ago

@Amstell, the for loop is adding a column identifying the name the data set (by the name of the 2nd column, and then renaming the second column to allow the data to be stored in 3 columns in long form)

jangorecki Over a year ago

is loading reshape2 still necessary in 1.9.6?

mnel Over a year ago

@jangorecki perhaps not.

Frank Over a year ago

Maybe worth noting that stacking in long form depends on the columns' having the same class (as is the case for the OP)... In that case, I'd argue for sticking to long form.

bramtayl · Accepted Answer · 2015-10-09 02:35:34Z

0

Dunno if this will be faster, but:

list(jan = jan %>% rename(PRISM = PRISM_ppt_stable_4kmM2_193301_bil), 
     feb = feb %>% rename(PRISM = PRISM_ppt_stable_4kmM2_193302_bil), 
     mar = mar %>% rename(PRISM = PRISM_ppt_stable_4kmM2_193303_bil)) %>%
  bind_rows(.id = "month") %>%
  spread(month, PRISM)

answered Oct 9, 2015 at 2:35

bramtayl

4,0242 gold badges13 silver badges20 bronze badges

2 Comments

Vedda Over a year ago

I take care of the rename after the data has been merged with colnames()

bramtayl Over a year ago

In this case, it is necessary for all of the PRISM columns to have the same name for bind_rows to work. The names of the list (jan = ) will end up as the new names of the PRISM columns after the reshape

Collectives™ on Stack Overflow

Join data frames faster

Example

2 Answers 2

5 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

Example

2 Answers 2

5 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related