0

I have two data frames:

> df1
       Long Short
EURUSD 47295 16057
GBPUSD 17385  6861
USDJPY  7146  9369
USDCHF  2704  5162
USDCAD  4705 11947
AUDUSD 13041  6654
NZDUSD  7184  4000

> df2
       Long Short
EURUSD  318    408
GBPUSD  181    276
USDJPY  217    203
USDCHF   97     57
USDCAD  178    121
AUDUSD  142    202
NZDUSD   95    138

I need the final data frame to be like this:

> Final
       Long   Short
EURUSD 47613   16465

...    ...     ...

NZDUSD 7279    4138

The merge/concatenate approach isn't working. I appreciate any help.

7
  • 4
    df1 + df2 doesn't do it? Commented Aug 2, 2017 at 13:15
  • 1
    If your first column is a factor variable, it will output NA when trying simple addition as @Vandenman suggested. In that case, use cbind(df1[,1], df1[, 2:3] + df2[, 2:3]). Commented Aug 2, 2017 at 13:21
  • 1
    How is it that your first column (the factors) has no column name? It looks like row names, which should not be impacting the df1+df2 thing. If Leo's doesn't do it for you, can you make this a bit more reproducible by include the output from dput(head(x)) and what "isn't working" means (warnings, errors, etc)? Commented Aug 2, 2017 at 13:27
  • Yes @r2evans they are row names which I did manually because the data is scraped. Would giving the row names a column name help? Leo's solution gives me the error "Error in '[.data.frame'(df1, , 2:3) : undefined columns selected" Commented Aug 2, 2017 at 13:35
  • Though they look fine aesthetically, I'm not a fan of using row names in general: they can be fragile, some utilities do not retain them (so you need to work to keep them and keep them in order, not always obvious). Commented Aug 2, 2017 at 13:36

1 Answer 1

1

If the data does not have row names (my personal preference, not always controllable), here are three methods.

Your data:

df1 <- read.table(text = "Symbol Long Short
EURUSD 47295 16057
GBPUSD 17385  6861
USDJPY  7146  9369
USDCHF  2704  5162
USDCAD  4705 11947
AUDUSD 13041  6654
NZDUSD  7184  4000", header = TRUE, stringsAsFactors = FALSE)

df2 <- read.table(text = "Symbol Long Short
EURUSD  318    408
GBPUSD  181    276
USDJPY  217    203
USDCHF   97     57
USDCAD  178    121
AUDUSD  142    202
NZDUSD   95    138", header = TRUE, stringsAsFactors = FALSE)

A single helper-function that is used by methods 2 and 3:

psum <- function(..., na.rm = FALSE) rowSums(sapply(list(...), c), na.rm = na.rm)

(This is similar to pmin and family, and is needed so that NAs are not debilitating ...)

Method 1: cbind

This is @Leo P.'s comment, and relies on the two data.frames always having the exact same order of rows:

cbind(df1[,1,drop=FALSE], df1[,2:3] + df2[,2:3])
#   Symbol  Long Short
# 1 EURUSD 47613 16465
# 2 GBPUSD 17566  7137
# 3 USDJPY  7363  9572
# 4 USDCHF  2801  5219
# 5 USDCAD  4883 12068
# 6 AUDUSD 13183  6856
# 7 NZDUSD  7279  4138

Method 2: base R merging

This method does not rely on ordered or even presence of rows in both. To demo that this works, I'll remove one row from one of the dataframes:

df2 <- df2[-3,]

Rename the second frame's columns so that we can hold them merge them and retain data:

colnames(df2) <- c("Symbol", "Long2", "Short2")

And the actual work:

colnames(df2) <- c("Symbol", "Long2", "Short2")
within(merge(df1, df2, by = "Symbol", all = TRUE), {
  Long <- psum(Long, Long2, na.rm = TRUE)
  Short <- psum(Short, Short2, na.rm = TRUE)
  # cleanup, remove unneeded columns
  Long2 <- Short2 <- NULL
})
#   Symbol  Long Short
# 1 AUDUSD 13183  6856
# 2 EURUSD 47613 16465
# 3 GBPUSD 17566  7137
# 4 NZDUSD  7279  4138
# 5 USDCAD  4883 12068
# 6 USDCHF  2801  5219
# 7 USDJPY  7146  9369

Method 3: dplyr joining

Starting with fresh df1 and df2 (full with original names), I again remove a row:

df2 <- df2[-3,]

And the work:

library(dplyr)
full_join(df1, rename(df2, Long2 = Long, Short2 = Short), by = "Symbol") %>%
  mutate(
    Long = psum(Long, Long2, na.rm = TRUE),
    Short = psum(Short, Short2, na.rm = TRUE)
  ) %>%
  select(-Long2, -Short2)
#   Symbol  Long Short
# 1 EURUSD 47613 16465
# 2 GBPUSD 17566  7137
# 3 USDJPY  7146  9369
# 4 USDCHF  2801  5219
# 5 USDCAD  4883 12068
# 6 AUDUSD 13183  6856
# 7 NZDUSD  7279  4138

Edit

The data in your question is under-representative. Based on your comments, it appears that what you really have is something like:

str(df1)
# 'data.frame': 7 obs. of  2 variables:
#  $ Long : Factor w/ 7 levels "2704","4705",..: 7 6 3 1 2 5 4
#  $ Short: Factor w/ 7 levels "4000","5162",..: 7 4 5 2 6 3 1

(For future reference, this would have been clearer had you provided data in an unambiguous consumable form, such as:

# dput(df1) ... possibly with options(deparse.max.lines=NULL) beforehand
structure(list(
  Long = structure(c(7L, 6L, 3L, 1L, 2L, 5L, 4L), .Label = c("2704", "4705", "7146", "7184", "13041", "17385", "47295"), class = "factor"),
  Short = structure(c(7L, 4L, 5L, 2L, 6L, 3L, 1L), .Label = c("4000", "5162", "6654", "6861", "9369", "11947", "16057"), class = "factor")),
  .Names = c("Long", "Short"),
  row.names = c("EURUSD", "GBPUSD", "USDJPY", "USDCHF", "USDCAD", "AUDUSD", "NZDUSD"),
  class = "data.frame")

To get from your df1 to what I read in above, just do:

# convert from nascent factors to numbers
df1[] <- lapply(df1[], function(a) as.numeric(as.character(a)))
# bring the row names into a column
df1$Symbol <- rownames(df1)

The columns will be in a different order, but that's cosmetic and easily addressed if important enough. You can optionally remove the row names with rownames(df1) <- NULL. This needs to be done to df2 as well.

Sign up to request clarification or add additional context in comments.

4 Comments

How goes it, Andrew.G, is this addressing your question?
I'm trying to make the options work. I +1 because of all the effort you went to, but I can't tick answer yet because I can't get it to work. Specifically, my data is scraped from a dynamic web page so I can't do that first step. As in, just 'type the data', I tried stripping the rownames using <- NULL and following method 1. For method 1, I get "Error in [.data.frame(df1, , 2:3) : undefined columns selected. For method 3, I consistently get the error "full_join not found..." even though I've installed dplyr. @r2evans
The issue is the numbers are being treated as factors. When I try and convert them using df1[, c(1,2)] <- sapply(df1[, c(1,2)], as.numeric) it just changes the values of Long and Short columns to 1,2,3...7 instead of the actual values like 47613
Read your comments, @LeoP. gave you this part. (The reason it was not addressed in my answer is that nothing in your question initially demonstrated that they were not numbers. Had your sample data been given with something like dput, it would have been much clearer.)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.