1

I am trying to plot missing values using the function below. I get this error message:

Don't know how to automatically pick scale for object of type data.frame. Defaulting to continuous.
Error: Aesthetics must be either length 1 or the same as the data (65507): fill, x, y'
library(reshape2)
library(ggplot2)
library(dplyr)

ggplot_missing <- function(x){

  x %>% 
    is.na %>%
    melt %>%
    ggplot(data = .,
           aes(x ,
               y )) +
geom_raster(aes(fill = value)) +
    scale_fill_grey(name = "",
                    labels = c("Present","Missing")) +
    theme_minimal() + 
    theme(axis.text.x  = element_text(angle=45, vjust=0.5)) + 
    labs(x = "Variables in Dataset",
         y = "Rows / observations")
} 

ggplot_missing(productholding)

Any ideas?

2
  • the error message says it - the length of vector value needs to be either 1 (all obs same value) or same as length of the series. Have a look if the df is correctly melted... Commented Aug 23, 2017 at 7:37
  • If you provide a reproducible example it will be easier for others to help you. Commented Aug 23, 2017 at 7:42

2 Answers 2

3

The x and y in the ggplot is not specified in your function. I changed it to the following:

ggplot_missing <- function(data){
  df2 <- data %>% is.na %>% melt 

  ggplot(df2, aes(Var2, Var1, fill=value)) + 
    geom_raster() + 
    scale_fill_grey(name="", labels=c("Present", "Missing")) +
    theme_minimal() + 
    theme(axis.text.x  = element_text(angle=45, vjust=0.5)) + 
    labs(x = "Variables in Dataset",
         y = "Rows / observations")
}

Test data:

df <- iris
set.seed(4)
df[sample(nrow(df), 20), 2] <- NA
df[sample(nrow(df), 30), 3] <- NA
df[sample(nrow(df), 15), 4] <- NA

ggplot_missing(df)

enter image description here

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you so much :-). This helps.
what is var2 and var1? I get an error not finding these variables!
0

Slight variation on OP's question. If you want to visualize the missing data pattern for each variable at different levels of another (factor) variable...

ggplot_missing2 <- function(data, xvar, yvars) {
  # xvar should be a factor variable for this to work
  require(ggplot2)
  require(reshape2)
  newvar = "variable"
  newval = "value"
  dl <- melt(data, id.vars = xvar, measure.vars=yvars, variable.name=newvar, value.name = newval)
  dl <- dcast(dl, formula = as.formula(paste0(newvar,"~",xvar)),
              fun.aggregate = function(x) sum(is.na(x)))
  dl <- melt(dl, id.vars=newvar, variable.name=xvar, value.name=newval)
  ggplot(dl, aes_string(x=xvar, y=newvar)) + 
    geom_tile(aes_string(fill=newval), color="white") +
    geom_text(aes_string(label=newval)) + 
    scale_fill_continuous("Missing (N)", low="gray", high="cornflowerblue") +
    labs(title="Missing Data Pattern")
}

Test data:

df <- iris
set.seed(4)
df[sample(nrow(df), 20), 2] <- NA
df[sample(nrow(df), 30), 3] <- NA
df[sample(nrow(df), 15), 4] <- NA

ggplot_missing2(df)

test data plot output from function

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.