0

I have a very large data set including 250 string and numeric variables. I want to compare one after another columns together. For example, I am going to compare (difference) the first variable with second one, third one with fourth one, fifth one with sixth one and so on.
For example (The structure of the data set is something like this example), I want to compare number.x with number.y, day.x with day.y, school.x with school.y and etc.

number.x<-c(1,2,3,4,5,6,7)
number.y<-c(3,4,5,6,1,2,7)
day.x<-c(1,3,4,5,6,7,8)
day.y<-c(4,5,6,7,8,7,8)
school.x<-c("a","b","b","c","n","f","h")
school.y<-c("a","b","b","c","m","g","h")
city.x<- c(1,2,3,7,5,8,7)
city.y<- c(1,2,3,5,5,7,7) 
5
  • Your fancy curvy quote marks don't work when passed to R. Also, "compare" could mean anything. Commented Nov 24, 2015 at 18:13
  • Unlike most programming languages, the "." doesn't indicate member of a data frame or object: i.e. number.x and number.y are 2 completely different vectors. When you say compare, what specifically is the comparison? For example, if you enter number.y == number.x you will get a vector of the same length as number.x (or number.y) with TRUE and FALSE entries indicating where they are equal. Is this what you're looking for? Commented Nov 24, 2015 at 18:15
  • Thanks for your reply. For example (for numeric ones) whether the difference between number.x and number.y is 0 . Also comparison between two string columns means whether we have the same element. Commented Nov 24, 2015 at 19:02
  • Please amend your question with the desired result. Commented Nov 24, 2015 at 22:36
  • For example: number.x<-c(1,2,3,4,5,6,7) number.y<-c(3,4,5,6,1,2,7) My goal is, to compare these two columns with each other, and see, how many of the numbers are equal. Commented Nov 24, 2015 at 23:37

1 Answer 1

1

You mean, something like this?

> number.x == number.y
[1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
> length(which(number.x==number.y))
[1] 1
> school.x == school.y
[1]  TRUE  TRUE  TRUE  TRUE FALSE FALSE  TRUE
> test.day <- day.x == day.y
> test.day
[1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE

EDIT: Given your example variables above, we have:

df <- data.frame(number.x,
             number.y,
             day.x,
             day.y,
             school.x,
             school.y,
             city.x,
             city.y,
             stringsAsFactors=FALSE)

n <- ncol(df)  # no of columns (assumed EVEN number)

k <- 1
comp <- list()  # comparisons will be stored here

while (k <= n-1) {
      l <- (k+1)/2
      comp[[l]] <- df[,k] == df[,k+1]
      k <- k+2
}

After which, you'll have:

> comp
[[1]]
[1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE

[[2]]
[1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE

[[3]]
[1]  TRUE  TRUE  TRUE  TRUE FALSE FALSE  TRUE

[[4]]
[1]  TRUE  TRUE  TRUE FALSE  TRUE FALSE  TRUE

To get the comparison result between columns k and k+1, you look at the (k+1)/2 element of comp - i.e to get the comparison results between columns 7 & 8, you look at the comp element 8/2=4:

> comp[[4]]
[1]  TRUE  TRUE  TRUE FALSE  TRUE FALSE  TRUE

EDIT 2: To have the comparisons as new columns in the dataframe:

new.names <- rep('', n/2)
for (i in 1:(n/2)) {
     new.names[i] <- paste0('V', i)
}

cc <- as.data.frame(comp, optional=TRUE)
names(cc) <- new.names

df.new <- cbind(df, cc)

After which, you have:

> df.new
  number.x number.y day.x day.y school.x school.y city.x city.y    V1    V2    V3    V4
1        1        3     1     4        a        a      1      1 FALSE FALSE  TRUE  TRUE
2        2        4     3     5        b        b      2      2 FALSE FALSE  TRUE  TRUE
3        3        5     4     6        b        b      3      3 FALSE FALSE  TRUE  TRUE
4        4        6     5     7        c        c      7      5 FALSE FALSE  TRUE FALSE
5        5        1     6     8        n        m      5      5 FALSE FALSE FALSE  TRUE
6        6        2     7     7        f        g      8      7 FALSE  TRUE FALSE FALSE
7        7        7     8     8        h        h      7      7  TRUE  TRUE  TRUE  TRUE
Sign up to request clarification or add additional context in comments.

3 Comments

Hi, Thanks for your comment, Yes, I am exactly looking for this. But the problem is since I have 300 variables in my data set. I am looking for a way to compare one after another columns together. Do you have any idea regarding that?
Just to be sure I understand: you want to compare column 1 with 2, 3 with 4,... k with k+1, k+2 with k+3 etc. Correct?
Thanks so much for your help. Is there any way to have each comp related to comparison between two variables in a column. I mean having 4 new columns in the data set for comp1,comp2,comp3 and comp4?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.