1

I have a data frame that contains historical price returns. The data is organized with date columns and many Asset columns (denoted as A1,A2...). Each asset column contains price return data for each unique historical date. I would like to process this data to create a data frame with many asset columns and only one row of data - with the data row containing the aggregated/average of the rows for the new columns. The new columns needs headers that are the original asset name, concatenated with date information. A simplified example of the original date follows:

> df <- read.csv("data.csv", header=T)
> df
  Year Month A1 A2 A3
1 2015   Jan  1  1  1
2 2015   Feb  2  2  2
3 2015   Mar  3  3  3
4 2016   Jan  1  1  1
5 2016   Feb  2  2  2
6 2016   Mar  3  3  3

I used simple repeating numbers for the returns here. I am using a function that requires the data to be organized as follows:

> df2 <- read.csv("data2.csv", header=T)
> df2

  Returns A1.Jan A1.Feb A1.Mar A2.Jan A2.Feb A2.Mar A3.Jan A3.Feb A3.Mar
1 Average      1      2      3      1      2      3      1      2      3

For clarity, A1.Jan contains the average of all Year's Jan returns. Thanks in advance for the insight and/or solution.

1
  • This worked well for the groupings. Super efficient and much appreciated Commented Oct 1, 2018 at 21:19

3 Answers 3

1

Take a look at the base function reshape. This is basically the same task as is solved by the last example on its help page:

reshape(df, idvar="Year", direction="wide", timevar="Month")
  Year A1.Jan A2.Jan A3.Jan A1.Feb A2.Feb A3.Feb A1.Mar A2.Mar A3.Mar
1 2015      1      1      1      2      2      2      3      3      3
4 2016      1      1      1      2      2      2      3      3      3

You wanted the Year variable to remain as a column identifier but wanted the Month variable to act as a sequence that gets spread "wide".

Sign up to request clarification or add additional context in comments.

Comments

0

With data.table you can do

library(data.table)
setDT(df)
df[, lapply(.SD, mean), .SDcols = names(df)[grep("^A", names(df))], by = Month
   ][, Returns := "Average"
     ][, melt(.SD, id = c("Month", "Returns"))
       ][, dcast(.SD, Returns ~ variable + Month, value.var = 'value', sep = ".")]

#   Returns A1.Feb A1.Jan A1.Mar A2.Feb A2.Jan A2.Mar A3.Feb A3.Jan A3.Mar
#1: Average      2      1      3      2      1      3      2      1      3

In the first line we aggregate the data by Month. The part names(df)[grep("^A", names(df)) ensures that we only aggregate variables that start with the letter "A".

The second line creates variable Returns that contains the value "Average".

melt gathers you data into long format and dcast finally spreads into desired output.

data

df <- structure(list(Year = c(2015L, 2015L, 2015L, 2016L, 2016L, 2016L
), Month = c("Jan", "Feb", "Mar", "Jan", "Feb", "Mar"), A1 = c(1L, 
2L, 3L, 1L, 2L, 3L), A2 = c(1L, 2L, 3L, 1L, 2L, 3L), A3 = c(1L, 
2L, 3L, 1L, 2L, 3L)), .Names = c("Year", "Month", "A1", "A2", 
"A3"), class = "data.frame", row.names = c("1", "2", "3", "4", 
"5", "6"))

3 Comments

This looks like a good option. I was not planning to use tables but may reconsider. Really appreciate.
@FlyTrdr Do you mean "tables" as in data.table? The packages is one of the three most commonly used ways to handle rectangular data, besides data.frames (base R - see @42-'s answer) and tibbles (tidyverse - Paul's answer). Here is an intro: cran.r-project.org/web/packages/data.table/vignettes/…
@FlyTrdr Please also consider to accept an answer if it solved your problem.
0

Here's a tidyverse solution. I factored the months so they can be ordered, then used tidyr::gather() to convert into long format so I could dplyr::group_by() by month to dplyr::summarise() to find the average:

library(dplyr)
library(tidyr)

df <- read.table(text = "
  Year Month A1 A2 A3
1 2015   Jan  1  1  1
2 2015   Feb  2  2  2
3 2015   Mar  3  3  3
4 2016   Jan  1  1  1
5 2016   Feb  2  2  2
6 2016   Mar  3  3  3", header = T) %>%
  tbl_df()

df$Month <- df$Month %>%
  factor(levels = format(ISOdate(2000, 1:12, 1), "%b"))

df_tidy <- df %>%
  gather(asset, value, -Year, -Month) %>%
  group_by(Month, asset) %>%
  summarise(Average = mean(value)) %>%
  arrange(asset, Month)
df_tidy

# # A tibble: 9 x 3
# # Groups:   Month [3]
#   Month asset Average
#   <fct> <chr>   <dbl>
# 1 Jan   A1          1
# 2 Feb   A1          2
# 3 Mar   A1          3
# 4 Jan   A2          1
# 5 Feb   A2          2
# 6 Mar   A2          3
# 7 Jan   A3          1
# 8 Feb   A3          2
# 9 Mar   A3          3


# convert to wide format, as in OP - not sure of 'easy' way
# to order columns by asset.month other than using 'select()'
# (it currently sorts alphabetically).

df_tidy %>%
  unite(Returns, c(asset, Month), sep = ".") %>%
  spread(Returns, Average)

# # A tibble: 1 x 9
#   A1.Feb A1.Jan A1.Mar A2.Feb A2.Jan A2.Mar A3.Feb A3.Jan A3.Mar
#    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
# 1      2      1      3      2      1      3      2      1      3

2 Comments

Thanks Paul. Have not used this library but it looks very useful. I will give this a try. Much appreciated
No problems, please up-vote and accept my answer if you think it is the best solution for your problem.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.