0

I have a dataframe with Column A containing values:

**Channel**
Direct
Paid social
Organic social

What I want to do: Create a new column called groupedChannel where str_detect searches for string in Column A to add a value in groupedChannel.

Condition:
IF row in Column A matches regex "direct" THEN Column B value = "Direct" ELSE
IF row in Column B matches regex "social" THEN Column B value = "Social"

AFAIK, str_detect will return only TRUE/FALSE. How can I use the TRUE/FALSE to assign a value in column B?

4 Answers 4

1

Solution using base R regex functions, also handles when direct and social are not found in Channel column

# Dummy data
data <- data.frame(Channel = c("Direct Paid", "Social", "Organic", "Social Organic"),
                   stringsAsFactors = F)

# Use sapply to iterate through each value in the 'Channel' column in the above dataframe
data$groupChannel <- sapply(data$Channel, FUN = function(x){
  # Use base R regex functions to for conditions, and return values for new column
  if (grepl("direct", tolower(x))){
    return("Direct")
  }else if (grepl("social", tolower(x))){
    return("Social")
  }else{
    return("Direct or Social Not Found")
  }
})

head(data)
  Channel               groupChannel
1    Direct Paid                     Direct
2         Social                     Social
3        Organic Direct or Social Not Found
4 Social Organic                     Social
Sign up to request clarification or add additional context in comments.

1 Comment

Hi Jamie. Thanks, that worked. Is there a dplyr equivalent of grep1 function from base R?
1

I have a data.table solution based on conditional replacement. It uses grepl but you could use stringr::str_detect if you want:

library(data.table)
setDT(df)
df[, groupedChannel := "Social"]

# Conditional replacement
df[grepl("direct",colA), groupedChannel := "Direct"]

(solution is untested)

Comments

0

What you want is to match your regex, not simply detect.

library(dplyr)
library(stringr)

tibble(
  colA = c("**Channel**", "Direct", "Paid social", "Organic social")
) %>% 
  mutate(
    colB = str_match(colA, "[Ss]ocial|[Dd]irect")[,1],
    colB = str_to_lower(colB)
  )
#> # A tibble: 4 x 2
#>   colA           colB  
#>   <chr>          <chr> 
#> 1 **Channel**    <NA>  
#> 2 Direct         direct
#> 3 Paid social    social
#> 4 Organic social social

Created on 2020-04-29 by the reprex package (v0.3.0)

stringr::str_match returns a matrix, where the first column is the match itself, and subsequent columns for multiple groups, so we need to put [,1] at the end of that call. Then it matches both upper and lower case versions, so we convert all the matched groups to lowercase.

Alternatively, you could use str_extract like so: colB = str_extract(colA, "[Ss]ocial|[Dd]irect"), without the [,1].

Comments

0

Here's a base R solution, which assumes you have a clearly defined set of Channel_group values

Data:

data <- data.frame(Channel = c("Direct", "Paid social", "Organic social"),
                   stringsAsFactors = F)

You can define your Channel_group values in a vector a:

a <- c("(S|s)ocial", "(D|d)irect")

Now you use sub to substitute the Channel values by the Channel_group values; \\U makes sure that these values are returned as upper-case strings (use \\L if you prefer to have lower-case strings):

data$Channel_group <- sub(paste0(".*\\b(", paste(a, collapse = "|"),")\\b.*"), "\\U\\1", data$Channel, perl = T)

Result:

data
         Channel Channel_group
1         Direct        DIRECT
2    Paid social        SOCIAL
3 Organic social        SOCIAL

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.