2

I have a data.frame in R which has been constructed off the Example 1-3 dataset here: http://riskfactor.cancer.gov/diet/usualintakes/dataset.html

I converted it from SAS to ASCII using Stat Transfer, saving as a csv. I have imported the data into R using the read.csv command:

t_0104nc <- read.csv("foo.csv",header=T)

The data are in data.frame structure. Within this file are some columns that relate to weights (RNDW through RNDW32). While these appear to be integers, when looking at the data in Excel, the cells have General format. R has brought the data in as double.

I'm using RNDW1, and I need to confirm that it is integer. However, entering typeof(RNDW1) and storage.mode(RNDW1) both show the data as double.

What is the most efficient way for me to test that I only have integer values in that column? I don't want to coerce the data, as the existence of non-integers would indicate a fundamental problem with the data that coercing won't fix.

Alternatively, I was wondering if there was some way of importing the data so it stored as the simplest data type in R - which should then import these values as integer. Some of the data is integer, other data is single or double, so the data is not all the same type.

Update from the suggestions below. All I need is a simple boolean true/false test, so I have used:

if(isTRUE(all.equal(x, xi <- as.integer(x)))) y="TRUE" else y="FALSE"

y

Which then returns me a single true or false value to indicate the overall result of the test. I appreciate all the (rapid!) help I received, and I am happy with the code and my understanding of it.

1 Answer 1

1

all.equal provides one way to test if all of the values in a column are integer valued.

Here's a function that might do what you like:

careful.as.integer <- function(x) {
    if(!is.numeric(x)) return(x) # For factor, character, and logical vectors 
    if(all.equal(x, as.integer(x))== TRUE) {
          as.integer(x)
     } else {
          x
     }
}

DAT <- data.frame(a = c(NA, 1:3), 
                  b = c(1:2, 3.3, NA), 
                  species = c("cat", "dog", "goat", "okapi"))

data.frame(lapply(DAT, careful.as.integer))
#    a   b species
# 1 NA 1.0     cat
# 2  1 2.0     dog
# 3  2 3.3    goat
# 4  3  NA   okapi
Sign up to request clarification or add additional context in comments.

9 Comments

to shorten/obfuscate you could use if(isTRUE(all.equal(x, xi <- as.integer(x)) xi else x (I think)
@BenBolker -- I kind of like that, actually (with an additional )) on the end). I think I'll make isTRUE one of my new-words-of-the-day.
@BenBolker ... although I suppose it does trade an additional assignment step, every time, for the as.integer conversion that it sometimes avoids.
@JoshO'Brien - thanks for the code, because my dataset is 10287 rows long, I added na.fail(foo$x) to the end, to see if an error message triggers.
@BenBolker - I get an error message with your code, I tried: if(isTRUE(all.equal(dataa$REPLICATE_VAR, dataa$REPLICATE_NEW <- as.integer(dataa$REPLICATE_VAR)) dataa$REPLICATE_NEW else dataa$REPLICATE_VAR' and the error message is Error: unexpected symbol in "if(isTRUE(all.equal(dataa$REPLICATE_VAR, dataa$REPLICATE_NEW <- as.integer(dataa$REPLICATE_VAR)) dataa", sorry cannot work out how to use the code formatting.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.