2

I have the following data set, which is basically a data frame with 3 columns

column_A <- rep(sample(300:1000000, 903, replace = F), each=10)
column_B <- sample(5:25, 9030, replace = T)

df <- data.frame(column_A, column_B)
df$group <- sample(1:4, nrow(df), replace = T)
rm(column_A)
rm(column_B)

and I want to generate a graph using geom.point() using the following code:

graph_builder <- function(data_set, y_axis_parameter, category, group) {
 
  graph <- ggplot(data_set, aes(x = factor({{category}}), y = {{ y_axis_parameter }})) +
    geom_point() +
    theme(plot.title = element_text(hjust = 0.5)) +
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5)) + 
    facet_grid(rows = vars({{ group }}), scales = "free_x") 
  
  graph
}

graph_builder(df, column_B, column_A, group )

Working with my real data sets, similar to the generated data frame, I'm dealing with a large number of categories for the x-axis (close to 900) so the values on x-axis get cramped and not readable. I want to make my graph more readable.

My solution: I'm adding a new column to my data frame named "group" to the data frame and assigning numeric values, from 1 to 4. This assigns roughly equal number of data points into each of these four groups (1, 2, 3, and 4). But as you saw in the code, I'm adding this new column that assigns grouping outside of the grpah_builder() function.

I think there must be a better way to partition my dataframe into four (or 5) sub-groups, in such a way that the final graph has four subgraphs. I should mention that in my real data frames the values on the x-axis do not follow a uniform distribution, which makes different group sizes when using the cut() function. Look at this solution

enter image description here

Question 1: Is there a way I can divide my dataset within the graph_builder() function? As you see the graph generated by my code is not readable, any solution that makes it more readable is really appreciated.

2
  • Was the intention to split up the data so only 1/4th or 1/5th of the column_A values would need to be shown in each sub-plot? If so, you should split up the groups based on the column_A values and not at random, and perhaps use facet_wrap(vars({{group}}), scales = "free_x") to split up the axis labels. Commented Sep 28, 2023 at 2:54
  • @JonSpring Thanks for the comment. Yes, that is correct. My intention is to split up the data so only 1/4th of the column_A is shown in the first sub-plot. Do you have any suggestioins how to split based on column_A inside the graph_builder()? or even outside of the function in a efficient way? Also I had tried facet_wrap(). It makes some improvement but doesn't solve the problem. The graph is still unreadable. Commented Sep 28, 2023 at 3:12

1 Answer 1

3

You are asking heroics of your x axis. Here's a version where I've split the chart into 6 facets in order of the category values. This is only barely readable but it's not obvious to me much better can be done without a larger format. Maybe IMAX?

Here I convert the category to a factor and convert that to a number, so num_cat will range from 1 for the first column_A value to 903 for the last. Then we can split the groups ~evenly with a little math.

graph_builder <- function(data_set, y_axis_parameter, category) {
    
  data_set <- data_set %>%
    mutate(num_cat = as.numeric(factor({{category}})),
           group = floor(num_cat*6/max(num_cat + 1))) 
    
  ggplot(data_set, aes(x = factor({{category}}), y = {{ y_axis_parameter }})) +
    geom_point() +
    theme(plot.title = element_text(hjust = 0.5)) +
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5)) + 
    theme(strip.background = element_blank(), strip.text = element_blank()) +
    facet_wrap(vars(group), scales = "free_x", nrow = 6) 
  }

graph_builder(df, column_B, column_A )

enter image description here

Sign up to request clarification or add additional context in comments.

2 Comments

"Maybe IMAX?" made me laugh. There seem to be quite a lot of questions to which this applies. It seems to me a fundamental misunderstanding of what a plot is for. Expecting this many labels on a plot to be in any way useful is misguided (even if it was on IMAX). It's good to see it drawn out anyway to make the point clearly.
I call this theme_phonebook().

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.