2

I am building an App using shiny and openair to analyze wind data.
Right now the data needs to be “cleaned” before uploading by the user. I am interested in doing this automatically. Some of the data is empty, some of is not numeric, so it is not possible to build a wind rose. I want to:

    1. Estimate how much of the data is not numeric
    2. Cut it out and leave only numeric data

here is an example of the data:
the "NO2.mg" is read as a factor and not int becuse it does not consist only numbers
OK
here is a reproducible example:

no2<-factor(c(5,4,"c1",54,"c5",seq(2:50)))
no2
[1] 5  4  c1 54 c5 1  2  3  4  5  6  7  8  9  10 11 12 13 14
[20] 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
[39] 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
52 Levels: 1 10 11 12 13 14 15 16 17 18 19 2 20 21 22 ... c5
> as.numeric(no2)
[1] 45 34 51 46 52  1 12 23 34 45 47 48 49 50  2  3  4  5  6
[20]  7  8  9 10 11 13 14 15 16 17 18 19 20 21 22 24 25 26 27
[39] 28 29 30 31 32 33 35 36 37 38 39 40 41 42 43 44
2
  • 4
    library(fortunes);fortune(206). You will need to provide an example of your data. Even then.... Commented Aug 7, 2013 at 6:08
  • As a general rule, we are not a help desk. We appreciate if users ask clear, specific questions and show what they've tried and where they got stuck. Commented Aug 7, 2013 at 6:23

3 Answers 3

9

Worst R haiku ever:

Some of the data is empty, 
some of is not numeric, 
so it is not possible to build a wind rose.
Sign up to request clarification or add additional context in comments.

3 Comments

being mocked by a super geek programmer group --> check
@eliavs - well, you could provide some more relevant information as requested by Roman. A bunch of seemingly random figures that aren't reproducible doesn't go very far to allowing us to help. E.g. - dput(head(ranana.analysed.no2)) might be a good start, or better still, a complete example showing a troublesome section of your input data and an expected output dataset would be helpful.
@thelatemail thank you, reproducible data is important for help
4

To convert a factor to numeric, you need to convert to character first:

no2<-factor(c(5,4,"c1",54,"c5",seq(2:50)))
no2_num <- as.numeric(as.character(no2)) 
#Warning message:
#  NAs introduced by coercion 
no2_clean <- na.omit(no2_num) #remove NAs resulting from the bad data

# [1]  5  4 54  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
# [40] 37 38 39 40 41 42 43 44 45 46 47 48 49
# attr(,"na.action")
# [1] 3 5
# attr(,"class")
# [1] "omit"

length(attr(no2_clean,"na.action"))/length(no2)*100
#[1] 3.703704

Comments

1

OK this is how i did it i am sure someone has abetter way
i'd love it if you share with me
this is my data:
no2<-factor(c(5,4,"c1",54,"c5",seq(2:50)))
to count the "bad data:"

sum(is.na((as.numeric(as.vector(no2)))))

and to estimate the percent of bad data:
sum(is.na((as.numeric(as.vector(no2)))))/length(no2)*100

6 Comments

The as.vector is superfluous, but sum()-ing is.na() is fairly standard. Did you have any interest in "recovering" data by converting "c5" to "5"?
@DWin Factors are not vectors and as.vector coerces them to character. It's not superfluous here.
Interesting ... didn't realize that as.vector would do the same as as.character. But that doesn't change the fact that it's superfluous, because its getting passed to is.na which doesn't care whether it's "numeric" or "character". Consider: sum(is.na(factor(c(letters, NA))). The as.vector.factor function with its default arguments removes the levels attributes and converts to levels(fac)[fac].
@DWin But as.numeric won't create NAs when used on a factor, only when used on a character.
@DWin Of course as.numeric propagates NA. But that's not creating NA. The relevant cases are as.numeric(factor(c(1:3,"a"))) vs. as.numeric(as.character(factor(c(1:3,"a"))))
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.