0

This was partially already tackled in others posts but unfortunately I could not make it run properly on my side.

I have a data frame full of texts, and there are certain words that I want replaced by a unique name.

So, if we see the table bellow, I would want to replace each of the words "Banana Apple Tomato" by the word "Fruit" (the word Fruit can show up multiple times, that is ok) I also want to replace "Cod Pork Beef" by the word "Animals"

I have a full excel file where this mapping was done. The excel file has two columns. On column A, we have the unique name (like Fruit and Animals). On column B, we have the words that we want to replace on the text (Like Banana, Apple, Tomato).

The code I came up was:

hous <- read.table(header = TRUE, 
                   stringsAsFactors = FALSE, 
                   text="HouseType HouseTypeNo
'Banana Apple Tomato Honey' 'Onion Garlic Pepper Sugar'
'Cod Pork Beef' 'Mushrooms Soya Eggs' ")

maps <- read.table(header = TRUE, 
                           stringsAsFactors = FALSE, 
                           text="UniqueID Names
'Fruit' 'Banana'
'Fruit' 'Apple'
'Fruit' 'Tomato'
'Animals' 'Cod'
'Animals' 'Pork'
'Animals' 'Beef'")

hous %>% str_replace_all( pattern = maps$Names, replacement = maps$UniqueID)
*#Warning message:
In stri_replace_all_regex(string, pattern, fix_replacement(replacement),  :
  argument is not an atomic vector; coercing*

I cannot make it work. I basically just wanna look up for certain words, and replace them with some unique ids. It doesn't sound complicated, but I cannot make it run.

Just a few points: in my real data set I have thousands of words and IDs. I have seen in other examples people writing their ids, patters and replacements by hand. In my case that is not applicable.

The final output would be something like this:

hous <- read.table(header = TRUE, 
                   stringsAsFactors = FALSE, 
                   text="HouseType HouseTypeNo
'Fruit Fruit Fruit Honey' 'Onion Garlic Pepper Sugar'
'Animal Animal Animal' 'Mushrooms Soya Eggs' ")

Any help is appreciated.

Best regards

4
  • 1
    I think stringi::stri_replace_all_fixed with vectorize_all = FALSE is what you're looking for. Commented Sep 21, 2020 at 13:19
  • @RonakShah Thank you for pointing that out. I just amended that and added the desired output. Commented Sep 21, 2020 at 13:41
  • @Bas it doesn't work. I got: "argument is not an atomic vector; coercing" Commented Sep 21, 2020 at 13:52
  • @RonakShah Yes, only that column. Commented Sep 21, 2020 at 14:26

1 Answer 1

1

You can create a named list and use it to replace values in str_replace_all :

hous$HouseType <- stringr::str_replace_all(hous$HouseType, 
                            setNames(maps$UniqueID, maps$Names))
hous

#                HouseType               HouseTypeNo
#1 Fruit Fruit Fruit Honey Onion Garlic Pepper Sugar
#2 Animals Animals Animals       Mushrooms Soya Eggs
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.