I have a column of data that describes possible diseases. I am trying to change these qualitative values into quantitative ones. So for example setting conditions such as "if a row contains words "blood pressure" delete all characters and replace to be 3, if row contains "heart" replace to be 2, if the row contains "diabetes" or "kidney disease" replace to be 1, if any other condition replace to be 0.5"
For example my data looks like:
Gene Condition
Gene1 Name=Asymmetrical dimethylarginine level, Name=Bipolar disorder and schizophrenia, Name=3-hydroxypropylmercapturic acid levels in smoker
Gene2 Name=blood pressure, Name=diabetes
Gene3 Name=heart disease
Gene4 Name=Childhood ear infection
Gene5 NA
Gene6 Name=kidney disease
The output I am trying to reach based on my mentioned conditions is:
Gene Condition
Gene1 0.5
Gene2 3
Gene3 2
Gene4 0.5
Gene5 NA
Gene6 1
I am new to R and so not sure if the way I'm trying is the best, but I'm trying to run my conditions to replace the specific strings (but not all characters), producing multiple numbers in a row (mixed with strings) if more than 1 condition is met, then applying a getmaxfunction for each row to get the largest number available. However I am stuck on setting up conditions to perform the string to number conversation.
I've been trying to do:
data$condition[data$condition == "blood pressure"] <- "3"
data$condition[data$condition == "heart disease"] <- "2"
data$condition[data$condition == "diabetes" | "kidney disease"] <- "1"
data$condition[data$condition == "Name" && !"diabetes" | "kidney disease" | "blood pressure" | "heart disease"] <- "0.5"
However this gives an error that ' 'object of type 'closure' is not subsettable', and for this approach at least, I can't find the solution for this error online. Any help would be appreciated.
Example data (first time trying to give data, please let me know if something is amiss):
structure(list(Gene = c("Gene1", "Gene2", "Gene3", "Gene4", "Gene5",
"Gene6"), Condition = c(" Name=Asymmetrical dimethylarginine level, Name=Bipolar disorder and schizophrenia, Name=3-hydroxypropylmercapturic acid levels in smoker",
" Name=blood pressure, Name=diabetes", "Name=heart disease",
"Name=Childhood ear infection", NA, "Name=kidney disease")), row.names = c(NA,
-6L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x000001bea99a1ef0>)
getmaxto the cell/each row