remove y=0 values and x=0 values in scatter plot in R

Ask Question

Asked 2 years, 4 months ago

Modified 2 years, 3 months ago

Viewed 145 times

Part of R Language Collective

I have to do linearity test between different pair of variables. I'm using the code ggpairs(data) to do it as I have multiple variables. But many of my variables have y=0 values and x=0 values. So my graphs are similar to this : I would say that there is a positive correlation, but due to the x=0 and y=0 values, I'm not sure anymore how to interpret it. So, my questions are :

Do we have to remove theses points (y=0 and x=0 values) from the scatter plot when we do linearity test and when we calculate the pearson correlation coefficient or should we include them?
If we need to exclude them, how can we do it, in a way that it only removes the y=0 and x=0 for the corresponding scatter plot without removing the entire row from the database or without affecting the other scatter plots?

As an example, we can use this data set : The variables that I have use for the scatter plots (for each pair) are D_biologie, D_chimie, D_math,D_physic...., which are the duration of work in a specific field in years

structure(list(ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), D_biologie = c(0, 
1, 5, 2, 3, 12, 0, 4, 0, 0), D_chimie = c(2, 9, 0, 4, 0, 40, 
0, 6, 9, 0), D_math = c(5, 2, 0, 6, 0, 30, 10, 7, 0, 50), D_physic = c(12, 
3, 5, 7, 12, 5, 0, 9, 40, 6), D_french = c(40, 4, 35, 9, 40, 
0, 4, 4, 5, 7), D_eng = c(30, 0, 0, 10, 30, 4, 2, 0, 0, 50), 
    D_hist = c(5, 6, 0, 4, 5, 0, 6, 7, 0, 0), D_geo = c(0, 8, 
    2, 0, 0, 0, 9, 1, 0, 0)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -10L))

edited Aug 3, 2023 at 14:06

asked Aug 2, 2023 at 19:52

R_help

256 bronze badges

1

It looks like there is a mix of data, some of which has zero correlation (e.g. along the x and y axes), some of which has moderate correlation, and some with a perfect correlation along y = x. Absent more context, I would assume a correlation should include all the data.

Jon Spring
– Jon Spring

2023-08-02 20:21:43 +00:00
Commented Aug 2, 2023 at 20:21
Because of the points with zero correlation (along x and y), I suppose I can't use a Pearson correlation as, overall, the relationship isn't linear. But do you think it's OK (acceptable) to use a spearman correlation instead?

R_help
– R_help

2023-08-11 14:30:13 +00:00
Commented Aug 11, 2023 at 14:30

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

remove y=0 values and x=0 values in scatter plot in R

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest