3

I am looking at replacing all numbers in a dataframe with words/strings. Each number will be replaced with the exact same word. e.g. all instances of the number 5 should be replaced with 'banana', all instances of the number 10 with 'kiwi', and so on.

Here is a sample dataframe. Rownames and colnames are numbers too:

#    1  2  3  4  5  6
#1   7  7  7  7  7  7
#2   5  5  5  5  5  5
#3   4  4  4  4  4  4
#4   8  8  8  8  8  8
#5   1  1  1  1  1  1
#6   2  2  2  2  2  2
#7   6  6  6  6  3  3
#8   3  3  3  3  6  6
#9  10 10 10 10 10 10
#10 11 11 11 11 11 11
#11 12 12 12 12 12 12
#12  9  9  9  9  9  9

Here is the sample data (mydf) for reproducing this:

mydf<-structure(c(7, 5, 4, 8, 1, 2, 6, 3, 10, 11, 12, 9, 7, 5, 4, 8, 
1, 2, 6, 3, 10, 11, 12, 9, 7, 5, 4, 8, 1, 2, 6, 3, 10, 11, 12, 
9, 7, 5, 4, 8, 1, 2, 6, 3, 10, 11, 12, 9, 7, 5, 4, 8, 1, 2, 3, 
6, 10, 11, 12, 9, 7, 5, 4, 8, 1, 2, 3, 6, 10, 11, 12, 9), .Dim = c(12L, 
6L), .Dimnames = list(c("1", "2", "3", "4", "5", "6", "7", "8", 
"9", "10", "11", "12"), c("1", "2", "3", "4", "5", "6")))

Here is a dataframe (mydata) I constructed showing which number should be replaced with which word/fruit:

mydata <- data.frame(nums = c(1:12))                     
mydata$fruits<-c("apple", "pear", "orange", "melon", "banana", "grape", "pineapple",      "mango", "lemon", "kiwi", "guava", "peach")

I have tried looking through similarly named threads, but they mainly discuss changing certain parts of dataframes (e.g. specific variables or specific observations), not the contents of the whole dataframe.

I tried using multiple gsub commands, but this doesn't work for multiple reasons. I guess I need to use a function to apply across all variables in the df, but not sure what.

The final result should look something like this:

      1           2           3           4           5           6          
1  "pineapple" "pineapple" "pineapple" "pineapple" "pineapple" "pineapple"
2  "banana"    "banana"    "banana"    "banana"    "banana"    "banana"   
3  "melon"     "melon"     "melon"     "melon"     "melon"     "melon"    
4  "mango"     "mango"     "mango"     "mango"     "mango"     "mango"    
5  "apple"     "apple"     "apple"     "apple"     "apple"     "apple"    
6  "pear"      "pear"      "pear"      "pear"      "pear"      "pear"     
7  "grape"     "grape"     "grape"     "grape"     "orange"    "orange"   
8  "orange"    "orange"    "orange"    "orange"    "grape"     "grape"    
9  "kiwi"      "kiwi"      "kiwi"      "kiwi"      "kiwi"      "kiwi"     
10 "guava"     "guava"     "guava"     "guava"     "guava"     "guava"    
11 "peach"     "peach"     "peach"     "peach"     "peach"     "peach"    
12 "lemon"     "lemon"     "lemon"     "lemon"     "lemon"     "lemon"

Though ideally, the quote marks would not be visible (I'm not sure if this is possible though).

4 Answers 4

4

You can do this with match, which refers to a lookup vector (your mydata), returning the position in that vector of each element of another vector.

mydf[] <- mydata$fruits[match(mydf, mydata$nums)]

If you coerce to a data.frame, quotes aren't visible when you print the object to screen:

as.data.frame(mydf)

#            1         2         3         4         5         6
# 1  pineapple pineapple pineapple pineapple pineapple pineapple
# 2     banana    banana    banana    banana    banana    banana
# 3      melon     melon     melon     melon     melon     melon
# 4      mango     mango     mango     mango     mango     mango
# 5      apple     apple     apple     apple     apple     apple
# 6       pear      pear      pear      pear      pear      pear
# 7      grape     grape     grape     grape    orange    orange
# 8     orange    orange    orange    orange     grape     grape
# 9       kiwi      kiwi      kiwi      kiwi      kiwi      kiwi
# 10     guava     guava     guava     guava     guava     guava
# 11     peach     peach     peach     peach     peach     peach
# 12     lemon     lemon     lemon     lemon     lemon     lemon    

Whether or not you coerce to data.frame, you can supply quote=FALSE to write.table or write.csv to prevent quotes appearing around character strings in the exported file.

Sign up to request clarification or add additional context in comments.

3 Comments

Since the lookup data is already sorted, just mydf[] <- mydata$fruits[mydf] will work.
@thelatemail: Yep, baptiste mentioned the same. I gave a generally-applicable solution since I didn't want to assume the OP's real-life problem was as simple as the example.
Thanks. This works great - as actually do all the suggestions. It's great how such a problem can be solved in multiple ways. Thanks too for the general solution. Obviously, my real data issue is much more complex than the simple example ... and has nothing to do with fruit either!
0

As the fruits are in the correct order and are indexed by 1:12, you can use the entries of mydf to index into mydata$fruits:

apply(mydf, 2, function(x) mydata$fruits[x])

If the values are not in the correct order, or do not cover all possible values (have "holes"), you can use a factor to translate:

apply(mydf, 2, function(x) factor(x, levels=mydata$nums, labels=mydata$fruits))

Comments

0

Another possible approach:

library(qdapTools)
as.data.frame(apply(mydf, 2, lookup, mydata))

##            1         2         3         4         5         6
## 1  pineapple pineapple pineapple pineapple pineapple pineapple
## 2     banana    banana    banana    banana    banana    banana
## 3      melon     melon     melon     melon     melon     melon
## 4      mango     mango     mango     mango     mango     mango
## 5      apple     apple     apple     apple     apple     apple
## 6       pear      pear      pear      pear      pear      pear
## 7      grape     grape     grape     grape    orange    orange
## 8     orange    orange    orange    orange     grape     grape
## 9       kiwi      kiwi      kiwi      kiwi      kiwi      kiwi
## 10     guava     guava     guava     guava     guava     guava
## 11     peach     peach     peach     peach     peach     peach
## 12     lemon     lemon     lemon     lemon     lemon     lemon

Comments

0

replace might work for you here.

> replace(mydf, seq_along(mydf), mydata[[2]][mydf])
#    1           2           3           4           5           6          
# 1  "pineapple" "pineapple" "pineapple" "pineapple" "pineapple" "pineapple"
# 2  "banana"    "banana"    "banana"    "banana"    "banana"    "banana"   
# 3  "melon"     "melon"     "melon"     "melon"     "melon"     "melon"    
# 4  "mango"     "mango"     "mango"     "mango"     "mango"     "mango"    
# 5  "apple"     "apple"     "apple"     "apple"     "apple"     "apple"    
# 6  "pear"      "pear"      "pear"      "pear"      "pear"      "pear"     
# 7  "grape"     "grape"     "grape"     "grape"     "orange"    "orange"   
# 8  "orange"    "orange"    "orange"    "orange"    "grape"     "grape"    
# 9  "kiwi"      "kiwi"      "kiwi"      "kiwi"      "kiwi"      "kiwi"     
# 10 "guava"     "guava"     "guava"     "guava"     "guava"     "guava"    
# 11 "peach"     "peach"     "peach"     "peach"     "peach"     "peach"    
# 12 "lemon"     "lemon"     "lemon"     "lemon"     "lemon"     "lemon"   

And it can be wrapped with as.data.frame to remove quotes if necessary.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.