0

I have a data in which monthly cash balances are given per account number. However, the number of records or number of months for which data is given for each account id is different say some have 12 months data , some have 24 months data and so on. I have to put all the data in ARIMA model and predict balance for next month. I have notice that ARIMA model doesnt work for uneven periods or it produces unusual results.

`Account_id  Month  $ balance
A            201901 100
A            201902 120
A            201903 135
B            201903 20
C            201902 1700
C            201903 1400

` I tried to add the rows for missing months and put zero values against the balances by amending the dataset in excel resulting in all accounts having same number of records and months.

I want to do this manual step through R code. I believe this should be some loop / IF function or Rbind/cbind something but am not that fluent on the code. Please help!

Based on suggested soluton, i tried this:

It is producing 54 lines for each month each id and all balances showing as 0

months <- as.character(seq(as.Date('2015-01-
01'),as.Date('2019-06-01'), by = "1 month"))

accounts <- df$account_id

shell <- expand.grid(Account_id = accounts, Month = months, stringsAsFactors 
= F)

data <- data.frame(Account_id = df$account_id, Month = 
df$partition_ledger_year_month, balance = df$amount_usd,stringsAsFactors = F)

df2 <- merge(shell, data, by=c('Account_id','Month'), all.x = T)

df2[which(is.na(df2$balance)),]$balance <- 0

Expected output:

`Account_id  Month  $ balance
A            201901 100
A            201902 120
A            201903 135
B            201901 0
B            201902 0
B            201903 20
C            201901 0
C            201902 1700
C            201903 1400

All values are in my data frame columns, only i have to missing moth with '0' balance. The complete data for any account id is 54 months.

1
  • @RonakShah i have edited question to include expected output. Thanks much. Commented Sep 10, 2020 at 7:44

2 Answers 2

1

How about making a shell data.frame with zero balances then filling in the balances you have:

# All Possible Months
months <- as.character(seq(as.Date('2019-01-01'),as.Date('2020-01-01'), by = "1 month"))

# All Possible account ids
accounts <- LETTERS

# A shell
shell <- expand.grid(Account_id = accounts, Month = months, stringsAsFactors = F)

# Your data
data <- data.frame(Account_id = c('A','B','A'), Month = c('2019-02-01', '2019-03-01','2019-01-01'), balance = c(100,200,300),stringsAsFactors = F)

# Left Join to the shell
df <- merge(shell, data, by=c('Account_id','Month'), all.x = T)

# Fill in missing balances
df[which(is.na(df$balance)),]$balance <- 0


df
Sign up to request clarification or add additional context in comments.

3 Comments

thanks much @SmokeyShakers . In 4rth step, can i upload the data instead of manually creating all data since its a reasonably large dataset?
I have tried this , it is producing 54 lines for each month each id and all balances showing as 0 months <- as.character(seq(as.Date('2015-01- 01'),as.Date('2019-06-01'), by = "1 month")) accounts <- df$account_id shell <- expand.grid(Account_id = accounts, Month = months, stringsAsFactors = F) data <- data.frame(Account_id = df$account_id, Month = df$partition_ledger_year_month, balance = df$amount_usd,stringsAsFactors = F) df2 <- merge(shell, data, by=c('Account_id','Month'), all.x = T) df2[which(is.na(df2$balance)),]$balance <- 0
it's hard to say without seeing your data, but you probably need accounts <- unique(df$account_id)
0

You can use tidyr::complete and fill the missing values of balance with 0.

df1 <- tidyr::complete(df, Account_id, Month, fill = list(balance = 0))
df1
# A tibble: 9 x 3
#  Account_id  Month balance
#  <chr>       <int>   <dbl>
#1 A          201901     100
#2 A          201902     120
#3 A          201903     135
#4 B          201901       0
#5 B          201902       0
#6 B          201903      20
#7 C          201901       0
#8 C          201902    1700
#9 C          201903    1400

data

df <- structure(list(Account_id = c("A", "A", "A", "B", "C", "C"), 
    Month = c(201901L, 201902L, 201903L, 201903L, 201902L, 201903L
    ), balance = c(100L, 120L, 135L, 20L, 1700L, 1400L)), 
    class = "data.frame", row.names = c(NA, -6L))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.