0

Data I'm scrapping off of the web uses the * character to denote one thing and + to denote another.

Here's an example of what it looks like:

# Original Data
original_df <- data.frame(c("Randy Watson*+", "Cleo McDowell*", "Darryl Jenks"))
names(original_df) <- 'nameinfo'

original_df

I want to transform the data to look like this output:

# What I want the Data to look like
name <- c("Randy Watson", "Cleo McDowell", "Darryl Jenks")
this_thing <- c("1", "1", "0")
that_thing <- c("1", "0", "0")
desired_df <- data.frame(name_column, this_thing, that_thing)

desired_df

I basically want to use the prsense of * to denote one flag variable, + for another variable, then remove either * or + from the nameinfo field and use it as a new variable name.

Thanks.

1
  • Check out my needleInHaystack function here. Usage would be needleInHaystack(c("*", "+"), original_df$nameinfo). Some cleanup would be required. Commented Jul 23, 2014 at 17:29

2 Answers 2

2

grepl will work well here:

original_df$this_thing <- grepl("\\*", original_df$nameinfo)
original_df$that_thing <- grepl("\\+", original_df$nameinfo)
original_df$nameinfo <- gsub("\\*|\\+", "", original_df$nameinfo)
original_df

##        nameinfo this_thing that_thing
## 1  Randy Watson       TRUE       TRUE
## 2 Cleo McDowell       TRUE      FALSE
## 3  Darryl Jenks      FALSE      FALSE
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks @Tyler for the help. Any idea how to have it return 1 instead of true and 0 instead of false?
Logicals can be cerced to numeric with as.numeric; try as.numeric(c(T, F)).
Wouldn't an sapply be useful here?
@AnandaMahto Probably, but for 2 items (columns) I probably wouldn't bother.
0

Here is a different approach, using the character class :punct: and a single gsub call

original_df <- data.frame(c("Randy Watson*+", "Cleo McDowell*", "Darryl Jenks"))
names(original_df) <- 'nameinfo'    
original_df$this_thing <- c("1", "1", "0")
original_df$that_thing <- c("1", "0", "0")
original_df$nameinfo <- gsub("[[:punct:]]", "", original_df$nameinfo)

1 Comment

Thanks @lawyeR, the last line is helpful. Unfortunately the main thing for me is the flags of 0/1 or F/T for the two variables.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.