How to ignore null values in R?

Question

I have a data set with some null values in one field. When I try to run a linear regression, it treats the integers in the field as category indicators, not numbers.

E.g., for a field that contains no null values...

summary(lm(rank ~ num_ays, data=a)),

Returns:

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 10.607597   0.019927 532.317  < 2e-16 ***
num_ays      0.021955   0.007771   2.825  0.00473 **

But when I run the same model on a field with null values, I get:

Coefficients:
              Estimate Std. Error  t value Pr(>|t|)    

(Intercept)  1.225e+01  1.070e+00   11.446  < 2e-16 ***
num_azs0    -1.780e+00  1.071e+00   -1.663  0.09637 .  
num_azs1    -1.103e+00  1.071e+00   -1.030  0.30322    
num_azs10   -9.297e-01  1.080e+00   -0.861  0.38940    
num_azs100   1.750e+00  5.764e+00    0.304  0.76141    
num_azs101  -6.250e+00  4.145e+00   -1.508  0.13161

What's the best and/or most efficient way to handle this, and what are the tradeoffs?

Speaking null you got NA on mind? Is there chance that num_azs is a factor? Looks like not cleaned data for me... — Marek
– Marek, Commented Oct 25, 2010 at 19:50
I don't think it's a factor. Both num_ays and num_azs came from a MySQL export. Field type for both is integer, but num_azs can contain null values. — Dan
– Dan, Commented Oct 25, 2010 at 19:56
what does summary(a) say your data columns are? I guess a non numeric value is causing conversion to factor. Solution is to convert to numeric using as.numeric (as.character(foo)) — Spacedman
– Spacedman, Commented Oct 25, 2010 at 20:52
Thanks, Marek et al—turns out it's listed as a factor. I'll seek my answers in a different question. — Dan
– Dan, Commented Oct 25, 2010 at 21:33

Shane · Accepted Answer · 2010-10-25 19:31:06Z

3

You can ignore null values like so:

a[!is.null(a$num_ays),]

answered Oct 25, 2010 at 19:31

Shane

100k35 gold badges229 silver badges220 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Dan Over a year ago

Thanks, Shane. I tried to apply that using: summary(lm(rank ~ num_ays, data=a[!is.null(a$num_ays)])). It gave me the same output, though.

Marek Over a year ago

is.null returns TRUE if object is NULL and FALSE otherwise. So your construct returns all rows of a or 0-row data.frame. I'm pretty sure you was thinking about is.na ;)

Dirk is no longer here · Accepted Answer · 2010-10-25 19:43:22Z

2

And to build on Shane's answer: you can use that in the data= argument of lm():

summary(lm(rank ~ num_ays, data=a[!is.null(a$num_ays),]))

answered Oct 25, 2010 at 19:43

Dirk is no longer here

370k60 gold badges668 silver badges742 bronze badges

3 Comments

Dan Over a year ago

Thanks, Dirk. I tried that but it's still treating the numbers in the column as category labels... same result as before. Am I missing something else as well?

Dirk is no longer here Over a year ago

You are being tripped up by factors. That is a different issue. Try and search for "[r] factor" (ie the term factor within posts tagged [r] for R). You will need to read the data differently, and/or convert it.

Marek Over a year ago

Isn't better to use subset argument of lm?

Collectives™ on Stack Overflow

How to ignore null values in R?

2 Answers 2

2 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related