0

I have a csv file with this format :

android ; login.html , connect.json , page1.json 

windows ; login.html , connect.json , page1.json , page2.html , page5.html 

windows ; login.html , connect.json , page4.json

To do PCA multivariate analysis with these variables, these variable must be numeric like this :

1 ; 3  

0 ; 5

0 ; 3

0 or 1 to indicate whether windows or android followed by the number of pages. I am looking for a way to modify these non numeric data Any idea please? Best

1
  • 1
    Read in with the delimiter as ";", use count.fields on the second column and == for the first column.... Commented Mar 21, 2016 at 14:10

2 Answers 2

2

Here's one approach:

data.frame(V1 = as.numeric(mydf$V1 == "android"), 
           V2 = count.fields(textConnection(mydf$V2), sep = ","))
#   V1 V2
# 1  1  3
# 2  0  5
# 3  0  3

Sample data:

mydf <- read.table(
  header = FALSE, sep = ";", stringsAsFactors = FALSE, strip.white = TRUE,
  text = '"android" ; "login.html , connect.json , page1.json" 
"windows" ; "login.html , connect.json , page1.json , page2.html , page5.html" 
"windows" ; "login.html , connect.json , page4.json"')
Sign up to request clarification or add additional context in comments.

Comments

1

Try strsplit and lengths:

DF <- read.table(text = Lines, sep = ";", as.is = TRUE, strip.white = TRUE)
transform(DF, V1 = as.numeric(V1 == "android"), V2 = lengths(strsplit(V2, ",")))

giving:

  V1 V2
1  1  3
2  0  5
3  0  3

Note: We used this input:

Lines <- "android ; login.html , connect.json , page1.json 
windows ; login.html , connect.json , page1.json , page2.html , page5.html 
windows ; login.html , connect.json , page4.json"

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.