I'm new in R and I need some help getting some things done. First of all I have to analyse a huge dataset 766K rows with 2 columns in the form below:
G40 2003-04-09
Z11 1997-08-15
K60 2006-03-16
I10 2000-11-30
The name of the dataset is Rdiagnosesand there is no header so by default Col1 is V1 and Col2 is V2. The first column is the diagnoses and the second the date which it was diagnosed.
First I was thinking on creating a subset for each year separably. The way I'm try to do it is this way however it gives me an error.
diagnoses2009 <- as.Date( as.character(Rdiagnoses$V2), "%d-%m-%y")
Rdiagnoses_2009 <- subset(Rdiagnoses, V2 >= as.Date("2009-01-01") & V2 <= as.Date("2009-12-31") )
Warning messages:
1: In eval(expr, envir, enclos) :
Incompatible methods ("Ops.factor", "Ops.Date") for ">="
2: In eval(expr, envir, enclos) :
Incompatible methods ("Ops.factor", "Ops.Date") for "<="
Any suggestions of correcting that of a better way of choosing each year is highly appreciated. Thank you in advance for your help!
V2is a factor, not a date column.x <- c("2003-04-09", "09-04-2003");as.Date(x, format = "%d-%m-%Y")(notice capital Y and order of days, months and years).