Adding and merging two dataframes in R

Question

I have two data frames:

> df1
       Long Short
EURUSD 47295 16057
GBPUSD 17385  6861
USDJPY  7146  9369
USDCHF  2704  5162
USDCAD  4705 11947
AUDUSD 13041  6654
NZDUSD  7184  4000

> df2
       Long Short
EURUSD  318    408
GBPUSD  181    276
USDJPY  217    203
USDCHF   97     57
USDCAD  178    121
AUDUSD  142    202
NZDUSD   95    138

I need the final data frame to be like this:

> Final
       Long   Short
EURUSD 47613   16465

...    ...     ...

NZDUSD 7279    4138

The merge/concatenate approach isn't working. I appreciate any help.

If your first column is a factor variable, it will output NA when trying simple addition as @Vandenman suggested. In that case, use cbind(df1[,1], df1[, 2:3] + df2[, 2:3]). — LAP
– LAP, Commented Aug 2, 2017 at 13:21
How is it that your first column (the factors) has no column name? It looks like row names, which should not be impacting the df1+df2 thing. If Leo's doesn't do it for you, can you make this a bit more reproducible by include the output from dput(head(x)) and what "isn't working" means (warnings, errors, etc)? — r2evans
– r2evans, Commented Aug 2, 2017 at 13:27
Yes @r2evans they are row names which I did manually because the data is scraped. Would giving the row names a column name help? Leo's solution gives me the error "Error in '[.data.frame'(df1, , 2:3) : undefined columns selected" — Rubicon
– Rubicon, Commented Aug 2, 2017 at 13:35
Though they look fine aesthetically, I'm not a fan of using row names in general: they can be fragile, some utilities do not retain them (so you need to work to keep them and keep them in order, not always obvious). — r2evans
– r2evans, Commented Aug 2, 2017 at 13:36

r2evans · Accepted Answer · 2017-08-03 19:54:26Z

1

If the data does not have row names (my personal preference, not always controllable), here are three methods.

Your data:

df1 <- read.table(text = "Symbol Long Short
EURUSD 47295 16057
GBPUSD 17385  6861
USDJPY  7146  9369
USDCHF  2704  5162
USDCAD  4705 11947
AUDUSD 13041  6654
NZDUSD  7184  4000", header = TRUE, stringsAsFactors = FALSE)

df2 <- read.table(text = "Symbol Long Short
EURUSD  318    408
GBPUSD  181    276
USDJPY  217    203
USDCHF   97     57
USDCAD  178    121
AUDUSD  142    202
NZDUSD   95    138", header = TRUE, stringsAsFactors = FALSE)

A single helper-function that is used by methods 2 and 3:

psum <- function(..., na.rm = FALSE) rowSums(sapply(list(...), c), na.rm = na.rm)

(This is similar to pmin and family, and is needed so that NAs are not debilitating ...)

Method 1: `cbind`

This is @Leo P.'s comment, and relies on the two data.frames always having the exact same order of rows:

cbind(df1[,1,drop=FALSE], df1[,2:3] + df2[,2:3])
#   Symbol  Long Short
# 1 EURUSD 47613 16465
# 2 GBPUSD 17566  7137
# 3 USDJPY  7363  9572
# 4 USDCHF  2801  5219
# 5 USDCAD  4883 12068
# 6 AUDUSD 13183  6856
# 7 NZDUSD  7279  4138

Method 2: base R merging

This method does not rely on ordered or even presence of rows in both. To demo that this works, I'll remove one row from one of the dataframes:

df2 <- df2[-3,]

Rename the second frame's columns so that we can hold them merge them and retain data:

colnames(df2) <- c("Symbol", "Long2", "Short2")

And the actual work:

colnames(df2) <- c("Symbol", "Long2", "Short2")
within(merge(df1, df2, by = "Symbol", all = TRUE), {
  Long <- psum(Long, Long2, na.rm = TRUE)
  Short <- psum(Short, Short2, na.rm = TRUE)
  # cleanup, remove unneeded columns
  Long2 <- Short2 <- NULL
})
#   Symbol  Long Short
# 1 AUDUSD 13183  6856
# 2 EURUSD 47613 16465
# 3 GBPUSD 17566  7137
# 4 NZDUSD  7279  4138
# 5 USDCAD  4883 12068
# 6 USDCHF  2801  5219
# 7 USDJPY  7146  9369

Method 3: `dplyr` joining

Starting with fresh df1 and df2 (full with original names), I again remove a row:

df2 <- df2[-3,]

And the work:

library(dplyr)
full_join(df1, rename(df2, Long2 = Long, Short2 = Short), by = "Symbol") %>%
  mutate(
    Long = psum(Long, Long2, na.rm = TRUE),
    Short = psum(Short, Short2, na.rm = TRUE)
  ) %>%
  select(-Long2, -Short2)
#   Symbol  Long Short
# 1 EURUSD 47613 16465
# 2 GBPUSD 17566  7137
# 3 USDJPY  7146  9369
# 4 USDCHF  2801  5219
# 5 USDCAD  4883 12068
# 6 AUDUSD 13183  6856
# 7 NZDUSD  7279  4138

Edit

The data in your question is under-representative. Based on your comments, it appears that what you really have is something like:

str(df1)
# 'data.frame': 7 obs. of  2 variables:
#  $ Long : Factor w/ 7 levels "2704","4705",..: 7 6 3 1 2 5 4
#  $ Short: Factor w/ 7 levels "4000","5162",..: 7 4 5 2 6 3 1

(For future reference, this would have been clearer had you provided data in an unambiguous consumable form, such as:

# dput(df1) ... possibly with options(deparse.max.lines=NULL) beforehand
structure(list(
  Long = structure(c(7L, 6L, 3L, 1L, 2L, 5L, 4L), .Label = c("2704", "4705", "7146", "7184", "13041", "17385", "47295"), class = "factor"),
  Short = structure(c(7L, 4L, 5L, 2L, 6L, 3L, 1L), .Label = c("4000", "5162", "6654", "6861", "9369", "11947", "16057"), class = "factor")),
  .Names = c("Long", "Short"),
  row.names = c("EURUSD", "GBPUSD", "USDJPY", "USDCHF", "USDCAD", "AUDUSD", "NZDUSD"),
  class = "data.frame")

To get from your df1 to what I read in above, just do:

# convert from nascent factors to numbers
df1[] <- lapply(df1[], function(a) as.numeric(as.character(a)))
# bring the row names into a column
df1$Symbol <- rownames(df1)

The columns will be in a different order, but that's cosmetic and easily addressed if important enough. You can optionally remove the row names with rownames(df1) <- NULL. This needs to be done to df2 as well.

edited Aug 3, 2017 at 19:54

answered Aug 2, 2017 at 14:19

r2evans

167k8 gold badges92 silver badges176 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

r2evans Over a year ago

How goes it, Andrew.G, is this addressing your question?

Rubicon Over a year ago

I'm trying to make the options work. I +1 because of all the effort you went to, but I can't tick answer yet because I can't get it to work. Specifically, my data is scraped from a dynamic web page so I can't do that first step. As in, just 'type the data', I tried stripping the rownames using <- NULL and following method 1. For method 1, I get "Error in [.data.frame(df1, , 2:3) : undefined columns selected. For method 3, I consistently get the error "full_join not found..." even though I've installed dplyr. @r2evans

Rubicon Over a year ago

The issue is the numbers are being treated as factors. When I try and convert them using df1[, c(1,2)] <- sapply(df1[, c(1,2)], as.numeric) it just changes the values of Long and Short columns to 1,2,3...7 instead of the actual values like 47613

r2evans Over a year ago

Read your comments, @LeoP. gave you this part. (The reason it was not addressed in my answer is that nothing in your question initially demonstrated that they were not numbers. Had your sample data been given with something like dput, it would have been much clearer.)

Collectives™ on Stack Overflow

Adding and merging two dataframes in R

1 Answer 1

Method 1: `cbind`

Method 2: base R merging

Method 3: `dplyr` joining

Edit

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Method 1: cbind

Method 2: base R merging

Method 3: dplyr joining

Edit

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related

Method 1: `cbind`

Method 3: `dplyr` joining