0
  1. I loaded my dataset (original.csv) to R: original <- read.csv("original.csv")
  2. str(original) showed that my dataset has 16 variables (14 factors, 2 integers). 14 variables have missing values. It was OK, but 3 variables that are originally numbers, are known as factors.
  3. I searched web and get a command as: as.numeric(as.character(original$Tumor_Size)) (Tumor_Size is a variable that has been known as factor).
  4. By the way, missing values in my dataset are marked as dot (.)
  5. After running: as.numeric(as.character(original$Tumor_Size)), the values of Tumor_Size were listed and in the end a warning massage as: “NAs introduced by coercion” was appeared.
  6. I expected after running above command, the variable converted to numeric, but second str(original) showed that my guess was wrong and Tumor_Size and another two variables were factors. In the below is sample of my dataset: a piece of my dataset

How can I solve my problem?

2 Answers 2

6

The crucial information here is how missing values are encoded in your data file. The corresponding argument in read.csv() is called na.strings. So if dots are used:

original <- read.csv("original.csv", na.strings = ".")
Sign up to request clarification or add additional context in comments.

Comments

0

I'm not 100% sure what your problem is but maybe this will help....

original<-read.csv("original.csv",header = TRUE,stringsAsFactors = FALSE)
original$Tumor_Size<-as.numeric(original$Tumor_Size)

This will introduce NA's because it cannot convert your dot(.) to a numeric value. If you try to replace the NA's with a dot again it will return the field as a character, to do this you can use,

original$Tumor_Size[is.na(original$Tumor_Size)]<-"."

Hope this helps.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.