0

I have a data frame called 'foo':

 foo <- data.frame("row1" = c(1,2,3,4,5), "row2" = c(1,2.01,3,"-","-"))

'foo' was uploaded from a different program as a CSV file and has two columns. one is a numerical data type and the other is a factor data type.

str(foo)
'data.frame':   5 obs. of  2 variables:
$ row1: num  1 2 3 4 5
$ row2: Factor w/ 4 levels "-","1","2.01",..: 2 3 4 1 1

Notice there are dashes, e.g. "-" , in foo$row2, which causes this column to be a factor. I want to replace the dashes with zeros, such that data.class(foo$row2) will return 'numerical'. The idea is to replace all dashes in each column so I can run numberical analyses on it with R.

What is the simplest way to do this in R?

Thanks,

4 Answers 4

2

Q: The idea is to replace all dashes in each column so I can run numerical analyses on it with R.

Use apply or sapply with sub

 kk<-data.frame(apply(foo,2,function(x) as.numeric(sub("-",0,x))))
> kk
  row1 row2
1    1 1.00
2    2 2.01
3    3 3.00
4    4 0.00
5    5 0.00

> str(kk$row2)
 num [1:5] 1 2.01 3 0 0

Or, you can use sapply

kk<-data.frame(sapply(names(foo),function(x)as.numeric(sub("-",0,foo[,x]))))

Update: If you want just the second col, you don't need to use apply:foo$row2<- as.numeric(sub("-",0,foo[,2]))

Sign up to request clarification or add additional context in comments.

4 Comments

Does this apply the string replace function to each column in the data frame? If so, how can I target the second column only? Thx!
Yes, it does for all columns. If you want just the second col, you don't need to use apply:foo$row2<- as.numeric(sub("-",0,foo[,2]))
Instead of calling foo[,2] for the second column index, how can I call it by column name, i.e. foo$row2<- as.numeric(sub("-",0,foo[,foo$row2]))?
@AME did you even look at the other answers? This is exactly what I posted.
2

Here is one simple way to do it. There might be a more elegant way, but this will work:

> foo <- data.frame("row1" = c(1,2,3,4,5), "row2" = c(1,2.01,3,"-","-"))
> levels(foo$row2)[levels(foo$row2)=="-"]<-0
> foo$row2<-as.numeric(as.character(foo$row2))
> class(foo$row2)
[1] "numeric"
> foo
  row1 row2
1    1 1.00
2    2 2.01
3    3 3.00
4    4 0.00
5    5 0.00

Comments

1

I would use ifelse() for this: foo$row2 <- ifelse(foo$row2 == "-", 0, as.numeric(foo$row2))

you might also need to as as.character() to convert from factor to character

2 Comments

And an as.numeric to convert it to the numeric form OP needs.
Running this code on the real data set (not the example one shown here) returns #NA Coerced. The real data set that i'm trying to run this function on contains commas, e.g. 1,000. This seems to force an #NA coercion with the command you provided.
1

How about gsub...

as.numeric( gsub("-" , 0 , foo[,2] ) )
#[1] 1.00 2.01 3.00 0.00 0.00

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.