0

I have to do linearity test between different pair of variables. I'm using the code ggpairs(data) to do it as I have multiple variables. But many of my variables have y=0 values and x=0 values. So my graphs are similar to this :enter image description here I would say that there is a positive correlation, but due to the x=0 and y=0 values, I'm not sure anymore how to interpret it. So, my questions are :

  1. Do we have to remove theses points (y=0 and x=0 values) from the scatter plot when we do linearity test and when we calculate the pearson correlation coefficient or should we include them?

  2. If we need to exclude them, how can we do it, in a way that it only removes the y=0 and x=0 for the corresponding scatter plot without removing the entire row from the database or without affecting the other scatter plots?

As an example, we can use this data set : The variables that I have use for the scatter plots (for each pair) are D_biologie, D_chimie, D_math,D_physic...., which are the duration of work in a specific field in years

structure(list(ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), D_biologie = c(0, 
1, 5, 2, 3, 12, 0, 4, 0, 0), D_chimie = c(2, 9, 0, 4, 0, 40, 
0, 6, 9, 0), D_math = c(5, 2, 0, 6, 0, 30, 10, 7, 0, 50), D_physic = c(12, 
3, 5, 7, 12, 5, 0, 9, 40, 6), D_french = c(40, 4, 35, 9, 40, 
0, 4, 4, 5, 7), D_eng = c(30, 0, 0, 10, 30, 4, 2, 0, 0, 50), 
    D_hist = c(5, 6, 0, 4, 5, 0, 6, 7, 0, 0), D_geo = c(0, 8, 
    2, 0, 0, 0, 9, 1, 0, 0)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -10L))
2
  • 1
    It looks like there is a mix of data, some of which has zero correlation (e.g. along the x and y axes), some of which has moderate correlation, and some with a perfect correlation along y = x. Absent more context, I would assume a correlation should include all the data. Commented Aug 2, 2023 at 20:21
  • Because of the points with zero correlation (along x and y), I suppose I can't use a Pearson correlation as, overall, the relationship isn't linear. But do you think it's OK (acceptable) to use a spearman correlation instead? Commented Aug 11, 2023 at 14:30

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.