2

Say I have a excel table like this

You can download the example data from this link

I need to melt 2020-09, 2020-10, 2020-11 as date and extract adj_price for each pair of id and name.

How could I convert it to the dataframe as follows in R? Many thanks at advance.

enter image description here

2
  • do you need an automated process to melt 2020-09, 2020-10, etc. or do you only need to work with these 3 dates? Commented Nov 30, 2020 at 7:48
  • 1
    There are others dates in real data, 2020-08, 2020-07, etc. Commented Nov 30, 2020 at 7:50

1 Answer 1

2

I came out with this solution, it might be optimized but the output is what you want and it should work with any number of columns.

library(tidyverse)

df1 <- readxl::read_xlsx(path = "path to test_data.xlsx")

# get all dates from the column names
cols <- colnames(df1)[3:ncol(df1)]
dates <- cols[grep("^[0-9][0-9][0-9][0-9]-[0-9][0-9]$", cols)]

# make a vector that will be used to make column names
colnames(df1)[3:ncol(df1)] <- rep(dates, rep(3, length(dates)))


# make a table with id, name and dates

finaldf <- df1[-1,] %>% pivot_longer(cols = 3:last_col(), names_to = "dates", values_to = "values")

indicators <- df1[-1,]
colnames(indicators) <- c("id", "name", df1[1, 3:ncol(df1)])
indicators <- indicators %>% pivot_longer(cols = 3:last_col(), names_to = "indicator", values_to = "values")

# final join and formatting
finaldf <- cbind(finaldf, indicators[, "indicator"]) %>% 
  filter(indicator == "adj_price") %>% 
  select(-indicator) %>% 
  rename("adj_price" = values) %>% 
  mutate(adj_price = as.numeric(adj_price))

The output:

> finaldf
  id             name   dates adj_price
1  1     Stracke-Huel 2020-09      3.80
2  1     Stracke-Huel 2020-10      3.72
3  1     Stracke-Huel 2020-11      3.70
4  2     Gleason-Mann 2020-09      7.25
5  2     Gleason-Mann 2020-10      7.50
6  2     Gleason-Mann 2020-11      7.50
7  3 Bauch-Cartwright 2020-09        NA
8  3 Bauch-Cartwright 2020-10     13.03
9  3 Bauch-Cartwright 2020-11     12.38
Sign up to request clarification or add additional context in comments.

6 Comments

Many thanks, but why dates column doesn't contain 2020-09 and 2020-10?
It is the as.Date.character() that screw this up, I am not very familiar with Dates conversions
Known problem, see this SO question. The answer depends on what packages you want to use. Using only base functions, you will need to add a day to the date, see this answer
I print colnames(df1)[3:ncol(df1)] and find there are too many duplicates dates, any ideas? Please check the updates in question.
I have tested with example data, it seems also have some duplicated dates.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.