Create columns from aggregated row data in R

Question

I have a data frame that contains historical price returns. The data is organized with date columns and many Asset columns (denoted as A1,A2...). Each asset column contains price return data for each unique historical date. I would like to process this data to create a data frame with many asset columns and only one row of data - with the data row containing the aggregated/average of the rows for the new columns. The new columns needs headers that are the original asset name, concatenated with date information. A simplified example of the original date follows:

> df <- read.csv("data.csv", header=T)
> df
  Year Month A1 A2 A3
1 2015   Jan  1  1  1
2 2015   Feb  2  2  2
3 2015   Mar  3  3  3
4 2016   Jan  1  1  1
5 2016   Feb  2  2  2
6 2016   Mar  3  3  3

I used simple repeating numbers for the returns here. I am using a function that requires the data to be organized as follows:

> df2 <- read.csv("data2.csv", header=T)
> df2

  Returns A1.Jan A1.Feb A1.Mar A2.Jan A2.Feb A2.Mar A3.Jan A3.Feb A3.Mar
1 Average      1      2      3      1      2      3      1      2      3

For clarity, A1.Jan contains the average of all Year's Jan returns. Thanks in advance for the insight and/or solution.

This worked well for the groupings. Super efficient and much appreciated — FlyTrdr
– FlyTrdr, Commented Oct 1, 2018 at 21:19

IRTFM · Accepted Answer · 2018-09-30 20:59:16Z

1

Take a look at the base function reshape. This is basically the same task as is solved by the last example on its help page:

reshape(df, idvar="Year", direction="wide", timevar="Month")
  Year A1.Jan A2.Jan A3.Jan A1.Feb A2.Feb A3.Feb A1.Mar A2.Mar A3.Mar
1 2015      1      1      1      2      2      2      3      3      3
4 2016      1      1      1      2      2      2      3      3      3

You wanted the Year variable to remain as a column identifier but wanted the Month variable to act as a sequence that gets spread "wide".

answered Sep 30, 2018 at 20:59

IRTFM

264k22 gold badges381 silver badges503 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

markus · Accepted Answer · 2018-09-30 20:51:51Z

0

With data.table you can do

library(data.table)
setDT(df)
df[, lapply(.SD, mean), .SDcols = names(df)[grep("^A", names(df))], by = Month
   ][, Returns := "Average"
     ][, melt(.SD, id = c("Month", "Returns"))
       ][, dcast(.SD, Returns ~ variable + Month, value.var = 'value', sep = ".")]

#   Returns A1.Feb A1.Jan A1.Mar A2.Feb A2.Jan A2.Mar A3.Feb A3.Jan A3.Mar
#1: Average      2      1      3      2      1      3      2      1      3

In the first line we aggregate the data by Month. The part names(df)[grep("^A", names(df)) ensures that we only aggregate variables that start with the letter "A".

The second line creates variable Returns that contains the value "Average".

melt gathers you data into long format and dcast finally spreads into desired output.

data

df <- structure(list(Year = c(2015L, 2015L, 2015L, 2016L, 2016L, 2016L
), Month = c("Jan", "Feb", "Mar", "Jan", "Feb", "Mar"), A1 = c(1L, 
2L, 3L, 1L, 2L, 3L), A2 = c(1L, 2L, 3L, 1L, 2L, 3L), A3 = c(1L, 
2L, 3L, 1L, 2L, 3L)), .Names = c("Year", "Month", "A1", "A2", 
"A3"), class = "data.frame", row.names = c("1", "2", "3", "4", 
"5", "6"))

edited Sep 30, 2018 at 20:51

answered Sep 30, 2018 at 20:45

markus

26.5k5 gold badges47 silver badges59 bronze badges

3 Comments

FlyTrdr Over a year ago

This looks like a good option. I was not planning to use tables but may reconsider. Really appreciate.

markus Over a year ago

@FlyTrdr Do you mean "tables" as in data.table? The packages is one of the three most commonly used ways to handle rectangular data, besides data.frames (base R - see @42-'s answer) and tibbles (tidyverse - Paul's answer). Here is an intro: cran.r-project.org/web/packages/data.table/vignettes/…

markus Over a year ago

@FlyTrdr Please also consider to accept an answer if it solved your problem.

Paul · Accepted Answer · 2018-10-01 02:09:00Z

0

Here's a tidyverse solution. I factored the months so they can be ordered, then used tidyr::gather() to convert into long format so I could dplyr::group_by() by month to dplyr::summarise() to find the average:

library(dplyr)
library(tidyr)

df <- read.table(text = "
  Year Month A1 A2 A3
1 2015   Jan  1  1  1
2 2015   Feb  2  2  2
3 2015   Mar  3  3  3
4 2016   Jan  1  1  1
5 2016   Feb  2  2  2
6 2016   Mar  3  3  3", header = T) %>%
  tbl_df()

df$Month <- df$Month %>%
  factor(levels = format(ISOdate(2000, 1:12, 1), "%b"))

df_tidy <- df %>%
  gather(asset, value, -Year, -Month) %>%
  group_by(Month, asset) %>%
  summarise(Average = mean(value)) %>%
  arrange(asset, Month)
df_tidy

# # A tibble: 9 x 3
# # Groups:   Month [3]
#   Month asset Average
#   <fct> <chr>   <dbl>
# 1 Jan   A1          1
# 2 Feb   A1          2
# 3 Mar   A1          3
# 4 Jan   A2          1
# 5 Feb   A2          2
# 6 Mar   A2          3
# 7 Jan   A3          1
# 8 Feb   A3          2
# 9 Mar   A3          3


# convert to wide format, as in OP - not sure of 'easy' way
# to order columns by asset.month other than using 'select()'
# (it currently sorts alphabetically).

df_tidy %>%
  unite(Returns, c(asset, Month), sep = ".") %>%
  spread(Returns, Average)

# # A tibble: 1 x 9
#   A1.Feb A1.Jan A1.Mar A2.Feb A2.Jan A2.Mar A3.Feb A3.Jan A3.Mar
#    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
# 1      2      1      3      2      1      3      2      1      3

answered Oct 1, 2018 at 2:09

Paul

2,9691 gold badge15 silver badges29 bronze badges

2 Comments

FlyTrdr Over a year ago

Thanks Paul. Have not used this library but it looks very useful. I will give this a try. Much appreciated

Paul Over a year ago

No problems, please up-vote and accept my answer if you think it is the best solution for your problem.

Collectives™ on Stack Overflow

Create columns from aggregated row data in R

3 Answers 3

Comments

3 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related