Using GGPLOT2 to show relationships between factor variables

Question

I am trying to see relationships between Happiness and a multitude of other variables, for example, AGE or SEX or MARITAL STATUS, using ggplot(). I have this data set

https://xdaiisu.github.io/ds202materials/hwlabs/HAPPY.rds

library(ggplot2)

HAPPY[HAPPY == "IAP"] <- NA
HAPPY[HAPPY == "DK"] <- NA
HAPPY[HAPPY == "NA"] <- NA

I downloaded this data set, and I converted some of the variables to 'factors' using this code, I will just use MARITAL and HAPPY as an example;

HAPPY <- HAPPY %>% mutate(MARITAL = factor(MARITAL, 
                                           levels = c("NEVER MARRIED", "MARRIED", "SEPARATED", "DIVORCED", "WIDOWED"))) 
               %>% arrange(desc(MARITAL))

HAPPY <- HAPPY %>% mutate(HAPPY= factor(HAPPY, 
                                        levels = c("NOT TOO HAPPY", "PRETTY HAPPY", "VERY HAPPY"))) 
               %>% arrange(desc(HAPPY))

Now I want to use a ggplot2 graph to show the relationship between MARITAL and Happiness(denoted by the column HAPPY). I am relatively new to ggplot2, so I am just trying to figure out ways to use it. Also, if you don't want to do HAPPY VS MARITAL then you can use any variable or column to compare to HAPPY as well that you would like I just keep getting errors.

Thanks!

This is very open ended. but you might try using the dplyr package to group your data and find counts of occurrences of the permutations of Marital and Happy columns. ie df %>% group_by(MARITAL, HAPPY) %>% summarise (Count = n()) — Croote
– Croote, Commented Feb 28, 2019 at 4:42

Wolfgang Arnold · Accepted Answer · 2019-02-28 08:06:03Z

1

A starting point may be simply to visualize count of observations, e.g.: ggplot(HAPPY, aes(x = HAPPY, y = MARITAL)) + geom_count().

You might also try geom_bin2d: https://ggplot2.tidyverse.org/reference/geom_bin2d.html

answered Feb 28, 2019 at 8:06

Wolfgang Arnold

1,2528 silver badges17 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Ashirr K Kashyap · Accepted Answer · 2019-02-28 08:37:29Z

The following code should get you started.

#Loading Libraries
library(ggplot2)
library(dplyr)
library(ggthemes)

#reading data
df <- readRDS("HAPPY.rds")

df<- na.omit(df) #deleting NA's

#converting class of categorical columns from  character to factors 
df[sapply(df, is.character)] <- lapply(df[sapply(df, is.character)],as.factor)
df$AGE<- as.numeric(df$AGE)

#Grouping through dplyr and plotting through ggplot2
df %>% 
  group_by(HAPPY,SEX) %>%
  summarise(mean_age=mean(AGE))%>%
  ggplot(aes(x=HAPPY,y=mean_age,fill=SEX))+
  geom_bar( stat="identity",position = position_dodge())+
  labs(x="Happiness", y="Average Age")+
  theme_gdocs()+
  geom_text(aes(label=paste(round(mean_age,0)) ), vjust=0,position = position_dodge(0.9))+
  scale_fill_manual( values=c( "deeppink","mediumturquoise"))

output plot

Collectives™ on Stack Overflow

Using GGPLOT2 to show relationships between factor variables

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related