I am working on the SAT Scores database: https://nycopendata.socrata.com/Education/SAT-Results/f9bf-2cp4?
This is what it looks like:
> head(SAT)
DBN SCHOOL.NAME Num.of.SAT.Test.Takers
1 01M292 HENRY STREET SCHOOL FOR INTERNATIONAL STUDIES 29
2 01M448 UNIVERSITY NEIGHBORHOOD HIGH SCHOOL 91
3 01M450 EAST SIDE COMMUNITY SCHOOL 70
4 01M458 FORSYTH SATELLITE ACADEMY 7
5 01M509 MARTA VALLE HIGH SCHOOL 44
6 01M515 LOWER EAST SIDE PREPARATORY HIGH SCHOOL 112
SAT.Critical.Reading.Avg..Score SAT.Math.Avg..Score SAT.Writing.Avg..Score
1 355 404 363
2 383 423 366
3 377 402 370
4 414 401 359
5 390 433 384
6 332 557 316
In the Column Num.of.SAT.Test.Takers, many values are simply the character 's'. The corresponding values for the 's' columns also have 's' and no numeric scores.
> SATnocandidates<-SAT[SAT$Num.of.SAT=='s', ]
> head(SATnocandidates)
DBN SCHOOL.NAME Num.of.SAT.Test.Takers
23 02M392 MANHATTAN BUSINESS ACADEMY s
24 02M393 BUSINESS OF SPORTS SCHOOL s
26 02M399 THE HIGH SCHOOL FOR LANGUAGE AND DIPLOMACY s
39 02M427 MANHATTAN ACADEMY FOR ARTS & LANGUAGE s
41 02M437 HUDSON HIGH SCHOOL OF LEARNING TECHNOLOGIES s
42 02M438 INTERNATIONAL HIGH SCHOOL AT UNION SQUARE s
SAT.Critical.Reading.Avg..Score SAT.Math.Avg..Score SAT.Writing.Avg..Score
23 s s s
24 s s s
26 s s s
39 s s s
41 s s s
42 s s s
Questions
- In the original SAT dataframe, I want to replace all 's' values in $Num.of.SAT column with numeric vector 0.
- Subsequently, I want to selectively replace all 's' values in corresponding columns to 0.
- How can I write an overarching command to find and replace all 's' values in the data frame to 0?
na.stringsvalue when read in the data....na.rm=TRUEand they'll be removed: but your zeroes would skew the mean/median very low.NAwould still be better than 0 even for what you state you have to do.NAand 0 mean pretty different things....