0

I have a dataframe containing the safety data for 100 patients. There are different safety factors for each patient with the size of that specific factor.

   v1_d0_urt_redness v1_d0_urt_redness_size v1_d1_urt_redness v1_d1_urt_redness_size ...
P1          1              20             
P2          1              NA
P3          0              NA
.
.
.

Here redness=1 means there was redness and redness=0 means there was no redness, and therefore the redness_size was not reported.
In order to find what proportion of the data is missing I need to code the data as follows: if (the column containing redness=1 & the column containing redness_size=NA) then (the column containing redness_size<-NA) else if (the column containing redness=0 then the column containing redness_size<-0) to have this coded for d0,d1,.. and to repeat this process for the other variables like hardness, swelling and etc. Any ideas how one could implement this in R?

1
  • I edited my answer accordingly your resquest. Let me know it works on your dataset Commented Nov 26, 2019 at 21:38

1 Answer 1

2

If I understand well what you are trying to do and assuming your dataframe is called df, you can change values of the column redness_size by doing this:

df[df[,endsWith(colnames(df),"_redness")] == 1 & is.na(df[,endsWith(colnames(df),"redness_size")]),endsWith(colnames(df),"redness_size")] <- NA
df[df[,endsWith(colnames(df),"_redness")] == 1, endsWith(colnames(df),"redness_size")] <- 0
Sign up to request clarification or add additional context in comments.

8 Comments

Thank you, but as there are 5 days for each factor, is there anyway to use the patterns, like if the name contains 'redness' then do this procedure, and it would do it for the variables in 5 days.?
I am using df[df[,endsWith(colnames(df),"redness")]==0 ,endsWith(colnames(df),"redness_size")]<-0 , but I get the error Error in `[<-.data.frame`(`*tmp*`, df[, endsWith(colnames(df), "redness")] == : non-existent rows not allowed , any ideas what goes wrong here?
My colnames are : "v1_d0_urt_redness" "v1_d0_lt_redness" "v1_d1_urt_redness" "v1_d1_lt_redness" "v1_d2_urt_redness" "v1_d2_lt_redness" , "v1_d0_urt_redness_size" "v1_d0_lt_redness_size" "v1_d1_urt_redness_size" "v1_d1_lt_redness_size" "v1_d2_urt_redness_size"... , when I am using df[,endsWith(colnames(df),"redness")] I am getting the right columns but the earlier code does not work..
I edited the question with the updated variable names and checked the summary(df).
OK, I modified my answer accordingly. Do you still have the same issue ? Both lines are not working ? Have you check that colnames redness are in numeric format ? You can also try to run only df[df[,endsWith(colnames(df),"_redness")] == 1 & is.na(df[,endsWith(colnames(df),"redness_size")],endsWith(colnames(df),"redness_size")] to see what rows have been returned
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.