0

I'm researching the amount of counted individuals during four different sampling days for 9 different Town districts. so 4 count at 9 locations.

I was able to plot Sampling 1, 2, 3 and 4 indipendently from each other. But i have a threshold of 60 counted individuals to be able to utilise the data for futher statistics. So i have to cluster the data seeing as some samplings did not reach this threshold. THis is done by adding sampling 1 and sampling 2 of every town district together to see if adding these two sampling days results in the amount of needed individuals to get over the threshold of 60.

Now i have to add Sampling 1+2 and Sampling 3+4 together in order to create a ggplot similar to the one below but this time instead of Sampling 1, 2, 3 and 4 there sound be Sampling 1+2 and Sampling 3+4. 4 Samplings ggplot The code for the ggplot is WP+geom_point(aes(x=Sampling,y=Individuals, colour=TownDistrict))+ylab("Individuals")+xlab("Sampling")+ggtitle("Absolute amount of individuals observed over time per sampling per Town district")+scale_x_continuous(breaks = pretty_breaks(1))+scale_y_continuous(breaks = pretty_breaks(n=10))+geom_hline(yintercept = 60, colour="red") + geom_line(aes(Sampling,Individuals,colour=TownDistrict,group=TownDistrict))

The dataset Sampling is comprised of numerical value with a numeric range 1-4.

I also included my dataset to provide an overvieuw of the kind of data i'm working with. Dataset

I have tried using

install.packages("car") 
library(car) 
library(carData) 
install.packages(“forcats”) 
library(forcats) 

class(x) 
[1] "factor"  
levels(x) 
[1] "1" "2" "3" "4" 
str(x) 
 Factor w/ 4 levels "1","2","3","4": 1 2 3 4 

x 
 [1] 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 
Levels: 1 2 3 4 

recode(x, "c('1', '2')='Sampling 1+2';c('3', '4') = 'Sampling 3+4'") 
[1] Sampling 1+2 Sampling 1+2 Sampling 3+4 Sampling 3+4 
Levels: Sampling 1+2 Sampling 3+4 

but none of the code seems to change Sampling 1, 2, 3 and 4 into a combination of sampling 1+2 and Sampling 3+4 per town District.

I hope i have described my problem in enough detail.

As requested by the commends

dput(WPT)
structure(list(Individuals = c(4, 11, 17, 21, 49, 68, 69, 76, 
24, 85, 69, 61, 86, 69, 86, 71, 82, 53, 83, 76, 84, 99, 99, 86, 
79, 134, 124, 112, 111, 90, 122, 104, 81, 102, 115, 95)
`Sampling = c(1, 
2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 
3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4), TD = c(1, 1, 1, 1, 
2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7, 
7, 7, 7, 8, 8, 8, 8, 9, 9, 9, 9)`, TownDistrict = structure(c(1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 
5L, 5L, 5L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 9L, 
9L, 9L, 9L), levels = c("1", "2", "3", "4", "5", "6", "7", "8", 
"9"), class = "factor"), SMPL = structure(c(1L, 2L, 3L, 4L, 1L, 
2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 
2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), levels = 
c("1", 
"2", "3", "4"), class = "factor")), row.names = c(NA, -36L), class = 
c("tbl_df", 
"tbl", "data.frame"))
3
  • Can you provide tha data, using the dput() function? Commented Feb 28, 2023 at 15:18
  • Please provide enough code so others can better understand or reproduce the problem. Commented Feb 28, 2023 at 15:29
  • Please copy the output in your post. I cannot work from a picture. Commented Feb 28, 2023 at 15:33

2 Answers 2

0

Here, you will find my approach. You use dplyr to reshape your data and summarise the samplings.

library(tidyverse)

df <- structure(list(Individuals = c(4, 11, 17, 21, 49, 68, 69, 76, 
                                     24, 85, 69, 61, 86, 69, 86, 71, 82, 53, 83, 76, 84, 99, 99, 86, 
                                     79, 134, 124, 112, 111, 90, 122, 104, 81, 102, 115, 95),
                     Sampling = c(1, 
                                   2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 
                                   3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4), 
                     TD = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7, 
                            7, 7, 7, 8, 8, 8, 8, 9, 9, 9, 9), 
                     TownDistrict = structure(c(1L,1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 
                                                5L, 5L, 5L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L), 
                                              levels = c("1", "2", "3", "4", "5", "6", "7", "8",  "9"), 
                                              class = "factor"), SMPL = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L,
                                                                                    4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L,
                                                                                    1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), 
                                                                                  levels = c("1", "2", "3", "4"), 
                                                                                  class = "factor")), row.names = c(NA, -36L),class = c("tbl_df", "tbl", "data.frame"))

df %>%
  mutate("sampling2"=
           case_when(
             Sampling %in% c(1,2) ~ "1+2",
             Sampling %in% c(3,4) ~ "3+4"
           )) %>%
  group_by(TownDistrict, sampling2) %>%
  summarise(Individuals= sum(Individuals)) %>%
  ggplot(aes(x=sampling2, y= Individuals, color= TownDistrict, group=TownDistrict))+
  geom_point()+
  geom_line()
#> `summarise()` has grouped output by 'TownDistrict'. You can override using the
#> `.groups` argument.

Created on 2023-02-28 with reprex v2.0.2

Sign up to request clarification or add additional context in comments.

Comments

0

it is usually simpler to filter your dataset before ploting it. You could :

Step 1 : create a new "Sampling" column, where you put 1+2/ 2+3 whenever you have less than 60 people.

#I use dplyr a lot

library(dplyr)
data=data %>% mutate(newSampling=case_when(Individuals>=60 ~ Sampling,
Individuals<60 & (Sampling=="1"|Sampling=="2") ~"1+2",
Individuals<60 & (Sampling=="2"|Sampling=="3") ~"3+4"))

Step 2 : Do the sum of individuals for each "NewSampling" and "TownDistrict"

data=data %>% group_by(newSampling,TownDistrict) %>% 
mutate(IndividualsSum=sum(Individuals)) %>% ungroup()

Step 3 : create a new variable to know if the group should be represented in your plot or not

data=data %>% mutate(should_be_plotted=(IndividualsSum>=60)

Step 4 : filter your data on "should_be_plotted" before plotting

data %>% filter(should_be_plotted==TRUE) %>% ggplot()
#Rest of the plot's code

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.