1

I have a data structure that I got as a result of the problem stated here.

Code:

df <- tibble::tribble(~person, ~age, ~height,  
                      "John", 1, 20,  
                      "Mike", 3, 50,  
                      "Maria", 3, 52,  
                      "Elena", 6, 90,  
                      "Biden", 9, 120)  
df %>%
  mutate(
    age_c = cut(
      age,
      breaks = c(-Inf, 5, 10),
      labels = c("0-5", "5-10"),
      right = TRUE
    ),
    height_c = cut(
      height,
      breaks = c(-Inf, 50, 100, 200),
      labels = c("0-50", "50-100", "100-200"),
      right = TRUE
    )
  ) %>%
  count(age_c, height_c, .drop = FALSE)

# A tibble: 6 x 3
  age_c height_c     n
  <fct> <fct>    <int>
1 0-5   0-50         2
2 0-5   50-100       1
3 0-5   100-200      0
4 5-10  0-50         0
5 5-10  50-100       1
6 5-10  100-200      1

Now I am trying to create a scatter plot but I have a problem that it seems like the code is not noticing that the values on the X and Y axis are repeating. Instead, it is repeating them. So, I would expect my x-axis to have two values 0-5 and 5-10 (what I get is 0-5,0-5,0-5,5-10,5-10,5-10), and the y-axis three values 0-50, 50-100 and 100-200 (instead I have two series of them).

The code I use to plot:

ggplot(df, aes(x=age_c, y=height_c))

Expected plot (where the size of circles would be based on the value of N):
plot

10
  • This cannot be a scatter plot. Your values are factors. How would you want to plot 0-5 and 0-50?? Like what exactly do you mean by plotting? In the XY plane, there is no point known as 0-5, 0-50 Commented Nov 11, 2020 at 14:49
  • 0-5 are just margins, I see them more like labels ... 0-5 is category 1, 5-10 is category 2 .. Commented Nov 11, 2020 at 14:53
  • then that is not a scatter plot. A scatter plot is only used to graph real/continuous values and not categorical values Commented Nov 11, 2020 at 14:54
  • I think I am missing something in the logic. I have N value that I expect to be plotted based on the values of AGE and HEIGHT Commented Nov 11, 2020 at 14:56
  • A scatter plot takes in 2 values. X and Y. And both X and Y must be continuous values(Not categorical). Commented Nov 11, 2020 at 14:59

1 Answer 1

1

If you plot the count data.frame it should work:

countdf = df %>%
  mutate(
    age_c = cut(
      age,
      breaks = c(-Inf, 5, 10),
      labels = c("0-5", "5-10"),
      right = TRUE
    ),
    height_c = cut(
      height,
      breaks = c(-Inf, 50, 100, 200),
      labels = c("0-50", "50-100", "100-200"),
      right = TRUE
    )
  ) %>%
  count(age_c, height_c, .drop = FALSE)


countdf %>% 
filter(n>0) %>% 
ggplot(aes(x=age_c,y=height_c,size=n)) + 
geom_point() + 
scale_size_continuous(range=c(5,10),breaks=c(1,2))

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.