0

I'm building a table from a CSV file. When the file is initially loaded I need to load as characters.

datset <- read.csv("outcome-of-care-measures.csv", colClasses = "character")

I have function to convert a factor containing number (from other stack q)

as.numeric.factor <- function(x) {as.numeric(levels(x))[x]}

I clean up the file with

i<-17
datset[datset=="Not Available"]<-NA
datset<-datset[complete.cases(datset[,i]),]
x<- as.numeric.factor(datset[, i])

The datset table contains lots of columns I don't need so I build a new table :

dat <- data.frame(cbind("HospitalName"= datset[,2], "State"= datset[,7],"Rating" = x))                        

My problem is that even though x is numeric, it gets turned into a factor when loaded to the dataframe. I can verify this from debug mode with :

class(x)
"Numeric"

class(dat[,3])
"Factor"

In later code I'm trying to sort the Rating column but it's failing due it being a factor - I guess.

I've even tried appending stringsAsFactors = FALSE to read.csv but this has no effect.

How can I prevent x from being converted into a factor when loading to a DF?

13
  • 3
    Why don't you use the appropriate arguments in read.csv instead? There is an na.strings argument and you could import the column as numeric. Commented Aug 22, 2014 at 15:21
  • 1
    I am not able to reproduce the error when x <- factor(1:10): after transformation with as.numeric.factor and putting it in a data frame, it remains a numeric. @Henrik's comment is, I think, right on where there is a problem: what happens when you remove cbind? Commented Aug 22, 2014 at 15:24
  • 3
    In addition to the comment of @Roland, I think the data.frame(cbind step is problematic. The cbind step results in a matrix. A matrix can only hold type of values (see coercion hierarchy in the Value section of ?matrix). When you then apply data.frame on a character matrix, values are converted to factor (see stringsAsFactors argument in ?data.frame). See also options; stringsAsFactors. Commented Aug 22, 2014 at 15:24
  • 1
    Pay attention to Henrik. Note the examples in ?data.frame, which allow for constructs like data.frame(x = ...,y = ...), which is how you should be using the function. data.frame(cbind()) is a very bad habit. Commented Aug 22, 2014 at 15:27
  • 1
    @LeeH You don't have to specify the col classes (read.csv can do this automatically), but you could: colclasses <- rep("character, 40); colclasses[7] <- "numeric" Commented Aug 22, 2014 at 15:28

1 Answer 1

1

As Henrik explained in his comment, this:

dat <- data.frame(cbind("HospitalName"= datset[,2], "State"= datset[,7],"Rating" = x))

is a poor way to construct a data frame. cbind converts everything to a matrix, which can only hold a single data type. Hence the coercion.

It would be better to do:

dat <- data.frame(HospitalName = dataset[,2],state = dataset[,7],rating = x)

However, it is also true as Roland mentioned that you should be able to specify this one column to be numeric when reading the data in via:

colclasses <- rep("character", 40)
colclasses[7] <- "numeric"

and then passing that in read.csv.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks @joran. I'm still getting to grips with the class system in R. I understand completely where I was going wrong thanks to eloquent explanation

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.