2

i'm trying to plot multiple variables in the same ggplot and i want to change the color of the line based on the year. This is what i have so far but i want it to be easier to understand so im trying to do it:

ggplot(data = larceny_cases_districts, aes(x = Year, y = Cases))+
    geom_line(color = "#d62f53", size = 2 )+
    geom_point(shape = 21, color = "black", fill = "black", size = 6)+
    geom_text(aes(label = Cases), hjust=-1,vjust=0.1, color="#ff0000")+
    theme_ipsum()+
    ggtitle("Larceny Crimes in 2015-2018")

This is how my dataframe is set right now:

  larceny_cases_districts
  Year Cases
1 2015  2895
2 2016  4561
3 2017  4450
4 2018  2982

But i want it to look a bit like this so i can use colour = var_value to make multiple lines on geom_line() but i cant find a way to make this viable:

 larceny_cases_districts
              District 2015  2016  2017  2018
            1 A1      value value value value
            2 D4    value value value value
            3 B2    value value value value

My goal is to make a plot that has 3 lines and each one is the value of each year for each district

output of dput(head(larceny_cases_districts, 20)):

structure(list(District = c("A1", "D4", "B2"), `2015` = c(10L, 
    6L, 1L), `2016` = c(13L, 8L, 8L), `2017` = c(10L, 2L, 6L), `2018` = c(13L, 
    2L, 3L)), row.names = c("1", "2", "3"), class = "data.frame")

2 Answers 2

2

This type of problems generaly has to do with reshaping the data. The format should be the long format and the data is in wide format. See this post on how to reshape the data from wide to long format.

If the original data already is in long format, there is no need for the dplyr/tidyr pipe prior to the plotting instructions below.

library(dplyr)
library(tidyr)
library(ggplot2)
library(hrbrthemes)

larceny_cases_districts %>%
  pivot_longer(
    cols = starts_with('20'),
    names_to = 'Year',
    values_to = 'Cases'
  ) %>%
  mutate(Year = as.integer(Year)) %>%
  ggplot(aes(Year, Cases, fill = District)) +
  geom_line() +
  geom_point(shape = 21, color = "black", size = 6) +
  geom_text(aes(label = Cases), hjust = -1, vjust = 0.1, color = "#ff0000") +
  ggtitle("Larceny Crimes in 2015-2018") +
  theme_ipsum()

enter image description here

Data

larceny_cases_districts <- 
structure(list(District = c("A1", "D4", "B2"), `2015` = c(10L, 
    6L, 1L), `2016` = c(13L, 8L, 8L), `2017` = c(10L, 2L, 6L), 
    `2018` = c(13L, 2L, 3L)), row.names = c("1", "2", "3"), 
    class = "data.frame")
Sign up to request clarification or add additional context in comments.

11 Comments

To make this plot you changed the values in here? larceny_cases_districts <- read.table(text = " District 2015 2016 2017 2018 1 A1 value value value value 2 D4 value value value value 3 B2 value value value value ", header = TRUE, check.names = FALSE)
@GuilhermeFrediani Yes, I have. That's what the lapply after read.table is doing. The values sampled are completely different from yours but the general principle of plotting the graph is the same.
can you explain to me more about each line of your solution, please?
@GuilhermeFrediani If the data is in long format, there is a x axis variable and a y axis one. Then, in order to separate the lines, a grouping variable. In this case it's fill = District. Had it been color = District in the initial call to ggplot and the lines would have been colored by groups of District. And ggplot separates the groups automatically.
Ok, but like, i'll try to change the values of the plot, what do i need to do for that?
|
1

Maybe you are looking for this:

#Code
ggplot(data = larceny_cases_districts, aes(x = Year,
                                           y = Cases,
                                           color=factor(Year),
                                           group=1,
                                           fill=factor(Year)))+
  geom_line(size = 2 )+
  geom_point(shape = 21, color = "black", size = 6)+
  geom_text(aes(label = Cases), hjust=-1,vjust=0.1, color="#ff0000")+
  theme_ipsum()+
  ggtitle("Larceny Crimes in 2015-2018")

Output:

enter image description here

Some data used:

#Data
larceny_cases_districts <- structure(list(Year = 2015:2018, Cases = c(2895L, 4561L, 4450L, 
2982L)), class = "data.frame", row.names = c("1", "2", "3", "4"
))

6 Comments

This is closer to what i was expecting but i want something to be more like this: A single plot like you did but with 3 different lines and each point represents the larceny cases of a district, for example, the red line would be 2015 and the first point would be A1, then D4 and finally B2. And the color of the line is determined by each year
@GuilhermeFrediani Like a bar plot but using lines?
idk what that would look like actually hahahahahahahahaha
can you explain to me what the "L" after que values mean?
It is one 1 and it creates a line grouped only by itself!
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.