0

I have a dataset from a colleague. In the dataset we record the location where a given skin problem is. We record up to 20 locations for the skin problem.

i.e

scaloc1 == 2 scaloc2 == 24 scaloc3 == NA scalocn......

Would mean the skin problem was in place 1 and 24 and nowhere else

I want to reorganise the data so that instead of being like this it is

face 1/0 torso 1/0 etc

So for example if any of scaloc1 to scalocn contain the value 3 then set the value of face to be 1.

I had previously done this in STATA using:

foreach var in scaloc1 scaloc2 scaloc3 scaloc4 scaloc5 scaloc6 scaloc7 scaloc8 scaloc9 scal10 scal11 scal12 scal13 scal14 scal15 scal16 scal17 scal18 scal19 scal20{
  replace facescalp=1 if (`var'>=1 & `var'<=6) | (`var'>=21 & `var'<=26)
}

I feel like I should be able to do this using either a dreaded for loop or possibly something from the apply family?

I tried

dataframe$facescalp <-0
#Default to zero
apply(dataframe[,c("scaloc1","scaloc2","scalocn")],2,function(X){
      dataframe$facescalp[X>=1 & X<7] <-1
      })
#I thought this would look at location columns 1 to n and if the value was between 1 and 7 then assign face-scalp to 1

But didn't work....

I've not really used apply before but did have a good root around examples here and can't find one which accurately describes my current issue.

An example dataset is available: https://www.dropbox.com/s/0lkx1tfybelc189/example_data.xls?dl=0

If anything not clear or there is a good explanation for this already in a different answer please do let me know.

2 Answers 2

1

If I understand your problem correctly, the easiest way to solve it would probably be the following (this uses your example data set that you provided read in and stored as df)

# Add an ID column to identify each patient or skin problem 
df$ID <- row.names(df)

# Gather rows other than ID into a long-format data frame
library(tidyr)
dfl <- gather(df, locID,  loc, -ID)

# Order by ID
dfl <- dfl[order(dfl$ID), ]

# Keep only the rows where a skin problem location is present
dfl <- dfl[!is.na(dfl$loc), ]

# Set `face` to 1 where `locD` is 'scaloc1' and `loc` is 3
dfl$face <- ifelse(dfl$locID == 'scaloc1' & dfl$loc == 3, 1, 0)

Because you have a lot of conditions that you will need to apply in order to fill the various body part columns, the most efficient rout would probably to create a lookup table and use the match function. There are many examples on SO that describe using match for situations like this.

Sign up to request clarification or add additional context in comments.

Comments

0

Very helpful. I ended up using a variant of this approach

data_loc <- gather(data, "site", "location", c("scaloc1", "scaloc2", "scaloc3", "scaloc4", "scaloc5", "scaloc6", "scaloc7", "scaloc8", "scaloc9", "scal10", "scal11", "scal12", "scal13", "scal14", "scal15", "scal16", "scal17", "scal18", "scal19", "scal20")) 
#Make a single long dataframe

data_loc$facescalp <- 0
data_loc$facescalp[data_loc$location >=1 & data_loc$location <=6] <-1
#These two lines were repeated for each of the eventual categories I wanted
locations <- group_by(data_loc,ID) %>% summarise(facescalp = max(facescalp), upperarm = max(upperarm), lowerarm = max(lowerarm), hand = max(hand),buttockgroin = max(buttockgroin), upperleg = max(upperleg), lowerleg = max(lowerleg), feet = max(feet))

#Generate per individual the maximum value for each category, hence if in any of locations 1 to 20 they had a value corresponding to face then this ends up giving a 1

data <- inner_join(data,locations, by = "ID")
#This brings the data back together

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.