0

I have this data table called tmp.df.lhs.denorm which I provided the first 2 rows ahead:

    > dput(tmp.df.lhs.denorm[1:2])
structure(list(rules = c("{} => {Dental anesthetic products-Injectables cartridges|2288210-Septocaine Cart 4% w/EPI}", 
"{Dental small equipment-Water distiller parts & acc|5528005-EzeeKleen 2.5HD UV Lamp1,Dental small equipment-Water distiller parts & acc|5528005-EzeeKleen 2.5HD UV Lamp2} => {Dental small equipment-Water distiller parts & acc|5528004-EzeeKleen 2.5HD RO Membra}"
), support = c(0.501710236989983, 0.000610798924993892), confidence = c(0.501710236989983, 
1), lift = c(1, 1637.2), rule.id = 1:2, lhs_1 = c(NA, "Dental small equipment-Water distiller parts & acc|5528005-EzeeKleen 2.5HD UV Lamp1"
), lhs_2 = c(NA, "Dental small equipment-Water distiller parts & acc|5528005-EzeeKleen 2.5HD UV Lamp2"
)), .Names = c("rules", "support", "confidence", "lift", "rule.id", 
"lhs_1", "lhs_2"), class = c("data.table", "data.frame"), row.names = c(NA, 
-2L), .internal.selfref = <pointer: 0x0000000007120788>)

Note columns lhs_1 and lhs_2 which are the product of str split on column rules.

My problem is that for different data, the column rules might contain varying number of rules seperated by a comma, e.g. I could have gotten 3 columns lhs_1 , lhs_2 and lhs_3 and so on, depending how many commas I have in the column rules. The solution is to determine a fixed number of lhs_* columns (parameter in my code, let's say 6), wherein this specific example the dt tmp.df.lhs.denorm will be merged with additional 4 empty columns by the name lhs_3, lhs_4, lhs_5 and lhs_6. Any assistance appreciated

3
  • Not very clear. Please clarify. Where do you use max_len? Commented Dec 25, 2016 at 11:02
  • max_len is a global parameter which is set in a configuration file. Commented Dec 25, 2016 at 11:30
  • I think this answer will solve this. Commented Dec 25, 2016 at 11:59

1 Answer 1

0

I found a workaround that does the job:

tmp.df.lhs.denorm.art <- data.table(rules = character(),
                                         support = numeric(),
                                         confidence = numeric(),
                                         lift = numeric(),
                                         rule.id = integer(),
                                        lhs_1 = character(),
                                        lhs_2 = character(),
                                        lhs_3 = character(),
                                        lhs_4 = character(),
                                        lhs_5 = character(),
                                        lhs_6 = character()
                                      )
  tmp.df.lhs.denorm.complete <- rbindlist(list(tmp.df.lhs.denorm, tmp.df.lhs.denorm.art), fill=TRUE)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.