1

Is there a clearer and more efficient way to subset a dataframe in R using multiple conditions? Here is my simplified example. Columns containing triplicates (v1,v2,v3 and v4,v5,v6) can contain max one 0 value within triplicate per row, otherwise should be excluded:

v1  v2  v3  v4  v5  v6
1   0   3   0   0   2
1   1   1   1   2   0
0   0   0   1   1   0
0   0   0   0   0   0

Here is my simple way of approaching the problem.

data_short<-subset(data, (((v1 != 0 & v2 !=0) | (v1 != 0 & v3 !=0) | (v2 != 0 & v3 !=0)) & ((v4 != 0 & v5 !=0) | (v4 != 0 & v6 !=0) | (v5 != 0 & v6 !=0)))

v1  v2  v3  v4  v5  v6
1   1   1   1   2   0
1
  • 2
    df[rowSums(df[,1:3]==0)<=1 & rowSums(df[,4:6]==0)<=1,] Commented Nov 22, 2016 at 23:42

1 Answer 1

5

You can use rowSums to count the number of time the data is 0 in any 3 first and 3 last columns:

df <- read.table(text="v1  v2  v3  v4  v5  v6
1   0   3   0   0   2
1   1   1   1   2   0
0   0   0   1   1   0
0   0   0   0   0   0", header=TRUE)

df[rowSums(df[,1:3]==0)<=1 & rowSums(df[,4:6]==0)<=1,]

  v1 v2 v3 v4 v5 v6
2  1  1  1  1  2  0
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, it does the job!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.