0
test.data <- data.frame(summary = c("Execute commands as root via buffer overflow in Tooltalk database server (rpc.ttdbserverd)."
                                 ,"Information from SSL-encrypted sessions via PKCS #1."
                                 ,"ip_input.c in BSD-derived TCP/IP implementations allows remote attackers to cause a denial of service (crash or hang) via crafted packets."),
                        wascname=c(NA, NA, "Improper Input Handling"),stringsAsFactors = FALSE)

wascNames <- data.frame(wascname=c("Abuse of Functionality","Brute Force","Buffer Overflow","Content Spoofing"
                                   ,"Credential/Session Prediction","Cross-Site Scripting","Cross-Site Request Forgery","Denial of Service"
                                   ,"Fingerprinting","Format String","HTTP Response Smuggling","HTTP Response Splitting"
                                   ,"HTTP Request Smuggling","HTTP Request Splitting","Integer Overflows","LDAP Injection"
                                   ,"Mail Command Injection","Null Byte Injection","OS Commanding","Path Traversal"
                                   ,"Predictable Resource Location","Remote File Inclusion (RFI)","Routing Detour","Session Fixation"
                                   ,"SOAP Array Abuse","SSI Injection","SQL Injection","URL Redirector Abuse"
                                   ,"XPath Injection","XML Attribute Blowup","XML External Entities","XML Entity Expansion"
                                   ,"XML Injection","XQuery Injection","Cross-site Scripting","Directory Indexing"
                                   ,"Improper Filesystem Permissions","Improper Input Handling","Improper Output Handling","Information Leakage"
                                   ,"Insecure Indexing","Insufficient Anti-Automation","Insufficient Authentication","Insufficient Authorization"
                                   ,"Insufficient Password Recovery","Insufficient Process Validation","Insufficient Session Expiration","Insufficient Transport Layer Protection"
                                   ,"Remote File Inclusion","URl Redirector Abuse"),stringsAsFactors = FALSE)

Below is the code I am have been trying to fix. If test.data$summary contains string in wascNames$wascname, replace test.data$wascname only if is.na:

test.data$wascname<-sapply(test.data$summary, function(x) 
      ifelse(identical(wascNames$wascname[str_detect(x,regex(wascNames$wascname, ignore_case = T))&
            is.na(test.data$wascname)==TRUE], character(0)),test.data$wascname,
            wascNames$wascname[str_detect(x,regex(wascNames$wascname, ignore_case = T))==TRUE]))

I want the following output:

enter image description here

Thank you in advance. Thought of using for loop, but would be too slow for 200000 obs.

1 Answer 1

1

I believe this should work:

test.data$wascname2 <- sapply(1:nrow(test.data), function(x)  ifelse(is.na(test.data$wascname[x]), 
                                              wascNames$wascname[str_detect(test.data$summary[x], regex(wascNames$wascname, ignore_case = TRUE))],
                                              test.data$wascname[x]))

test.data$wascname2
#[1] "Buffer Overflow"         NA                        "Improper Input Handling"

It still loops with sapply, but I think that's unavoidable given your data structure (i.e. for each string, you want to look it up in your wascNames$wascname table).

Sign up to request clarification or add additional context in comments.

1 Comment

Perfect. Exactly what I was looking for. Very clever. So you passing the row numbers and using those as index. Nice

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.