74

I have a problem to plot a subset of a data frame with ggplot2. My df is like:

df = data.frame(ID = c('P1', 'P1', 'P2', 'P2', 'P3', 'P3'),
                Value1 = c(100, 120, 300, 400, 130, 140),
                Value2 = c(12, 13, 11, 16, 15, 12))

How can I now plot Value1 vs Value2 only for IDs 'P1' and 'P3'? For example I tried:

ggplot(subset(df,ID=="P1 & P3") +
  geom_line(aes(Value1, Value2, group=ID, colour=ID)))

but I always receive an error.

5
  • ((ID =="P1") | (ID =="P3")) might do the trick Commented Aug 10, 2013 at 19:28
  • 1
    Or ID %in% c("P1", "P3"). Commented Aug 10, 2013 at 19:31
  • @Hong and @ LostBrit I receive for both commands an error: Error in as.vector(x, mode) : cannot coerce type 'environment' to vector of type 'any' Commented Aug 10, 2013 at 19:38
  • Yes, it gives an error. Can you say a little bit about what are you trying to plot? Commented Aug 10, 2013 at 19:42
  • Data would be helpful. Commented Aug 10, 2013 at 21:52

10 Answers 10

81

Here 2 options for subsetting:

Using subset from base R:

library(ggplot2)
ggplot(subset(dat,ID %in% c("P1" , "P3"))) + 
         geom_line(aes(Value1, Value2, group=ID, colour=ID))

Using subset the argument of geom_line(Note I am using plyr package to use the special . function).

library(plyr)
ggplot(data=dat)+ 
  geom_line(aes(Value1, Value2, group=ID, colour=ID),
                ,subset = .(ID %in% c("P1" , "P3")))

You can also use the complementary subsetting:

subset(dat,ID != "P2")
Sign up to request clarification or add additional context in comments.

4 Comments

It may be worth adding that following the depreciation of the subset argument the comparable results could be obtained using geom_line(data=dat[dat$ID %in% c("P1" , "P3"),], ...) as discussed here. In effect this works on the same basis as the answer below. The minor difference is using subsetted data inside the geom call.
@agstudy @konrad-rudolph defining data=function(x) {...} may work in place of subset.
@agstudy My data frame contains 3 columns (Year, rain, temp). So, when I am trying to plot only for the selected year I used ggplot(subset(data=aggdata, Year %in% c("1901" , "1910")), aes(x=Year, y=tem, color=factor(Year))) and it showing error Error in Year %in% c("1901", "1910") : object 'Year' not found. Could you tell me what I have to do?
is it possible to subset more than one column within ggplot? say I wanted to subset Year == 2022 & Age == 12 or something like that...with two columns to subset
29

There's another solution that I find useful, especially when I want to plot multiple subsets of the same object:

myplot<-ggplot(df)+geom_line(aes(Value1, Value2, group=ID, colour=ID))
myplot %+% subset(df, ID %in% c("P1","P3"))
myplot %+% subset(df, ID %in% c("P2"))

3 Comments

@Nick Yes, your code is working fine (creating plot) but in my case not showing the line! Could you tell me what I have to do? [i.postimg.cc/85VgpMKz/Screenshot-from-2020-09-29-15-21-54.png]
You've specified Year as both grouping variable and colour. Lines are drawn between data points of the same group. Setting up the plot in this way means you have only one observation per group. So the solution is to remove "group=Year"
Anyway to ensure that this keep colours the same? i.e. there are four lines, red green blue purple, if you subset to keep item 1 and 4 to keep the colours as red and purple, rather than red and green.
18

@agstudy's answer didn't work for me with the latest version of ggplot2, but this did, using maggritr pipes:

ggplot(data=dat)+ 
  geom_line(aes(Value1, Value2, group=ID, colour=ID),
                data = . %>% filter(ID %in% c("P1" , "P3")))

It works because if geom_line sees that data is a function, it will call that function with the inherited version of data and use the output of that function as data.

2 Comments

Does this still work? Not sure, whether they changed the . to .x. Haven't found anything in the NEWS though. See also my answer. BTW, they recently changed a lot.
@andschar Definitely still works. Both . and .x are fine.
15

With option 2 in @agstudy's answer now deprecated, defining data with a function can be handy.

library(plyr)
ggplot(data=dat) + 
  geom_line(aes(Value1, Value2, group=ID, colour=ID),
            data=function(x){x$ID %in% c("P1", "P3"))

This approach comes in handy if you wish to reuse a dataset in the same plot, e.g. you don't want to specify a new column in the data.frame, or you want to explicitly plot one dataset in a layer above the other.:

library(plyr)
ggplot(data=dat, aes(Value1, Value2, group=ID, colour=ID)) + 
  geom_line(data=function(x){x[!x$ID %in% c("P1", "P3"), ]}, alpha=0.5) +
  geom_line(data=function(x){x[x$ID %in% c("P1", "P3"), ]})

Comments

8

Are you looking for the following plot:

library(ggplot2) 
l<-df[df$ID %in% c("P1","P3"),]
myplot<-ggplot(l)+geom_line(aes(Value1, Value2, group=ID, colour=ID))

enter image description here

Comments

4

Your formulation is almost correct. You want:

subset(dat, ID=="P1" | ID=="P3") 

Where the | ('pipe') means 'or'. Your solution, ID=="P1 & P3", is looking for a case where ID is literally "P1 & P3"

Comments

4

You can use ~subset(., ...) - this is a way to do what Dave above suggests, which also

  • works with current {ggplot2} (3.4.2)
  • does not require the {magrittr} pipe - for those who switched to R pipe
  • references the data as it was input to the data param of the ggplot() function, e.g. when the data was piped in
  • is a bit more concise/easier to understand then defining a function
ggplot(mtcars, aes(hp, disp)) +
  geom_point() +
  geom_point(data = ~subset(., cyl == 4), color = "red")

e.g. also works like so when the data was piped in:

mtcars |> 
  filter(gear > 3) |> 
  ggplot(aes(hp, disp)) +
  geom_point() +
  geom_point(data = ~subset(., cyl == 4), color = "red")

Comments

2

Try filter to subset only the rows of P1 and P3

df2 <- filter(df, ID == "P1" | ID == "P3")

Than yo can plot Value1. vs Value2.

Comments

1

Similar to @nicolaskruchten s answer you could do the following:

require(ggplot2)

df = data.frame(ID = c('P1', 'P1', 'P2', 'P2', 'P3', 'P3'),
                Value1 = c(100, 120, 300, 400, 130, 140),
                Value2 = c(12, 13, 11, 16, 15, 12))

ggplot(df) + 
  geom_line(data = ~.x[.x$ID %in% c("P1" , "P3"), ],
            aes(Value1, Value2, group = ID, colour = ID))

Comments

0

Use subset within ggplot

ggplot(data = subset(df, ID == "P1" | ID == "P2") +
   aes(Value1, Value2, group=ID, colour=ID) +
   geom_line()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.