2

I am working with a dataframe in R that has some issues about encoding of strings. My dataframe df looks like this:

df
                                                     title
1 José Francisco Salgado - Executive Director and Founder
2 José Francisco Salgado - Executive Director and Founder

The issue is that strings should have accents where strange symbols are present. I tried next solution:

#Code
df$title <- iconv(df$title,"UTF-8","latin1") 

But it is not working as I get same strings with weird symbols. I do not know why this is not working because when I try this it does the job:

#Code2
iconv("José Francisco Salgado - Executive Director and Founder","UTF-8","latin1")
[1] "José Francisco Salgado - Executive Director and Founder"

Setting accents for the strings. How can I solve this issue and have this:

df
                                                     title
1 José Francisco Salgado - Executive Director and Founder
2 José Francisco Salgado - Executive Director and Founder

Many thanks.

This is the dput() version of df:

#Data
df <- structure(list(title = c("José Francisco Salgado - Executive Director and Founder", 
"José Francisco Salgado - Executive Director and Founder")), row.names = 1:2, class = "data.frame")
2
  • How did you create the data.frame in the first place? It's important to import the data with the correct encoding from the start. It can be very messy to try to clean it up if the system has already attempted to convert to the default system encoding. Are you on a Windows machine? Do you know for sure what encoding was used in the original data? Commented Nov 3, 2021 at 18:40
  • @MrFlick Hi, many thanks for your comment. The data came from a httr request which used UTF-8 encoding. I am using windows and after the request I used rawToChar() to create an input that can be feed into fromJSON() function and extract the values you see. I do not know how to properly convert the data to get the accents that it has and also I do not know how to set in the httr request and option to admit the accents. Commented Nov 3, 2021 at 19:24

2 Answers 2

2

One solution to the problem is to work on just one string at a time:

data.frame( sapply( df, function(x) iconv( x, "UTF-8", "LATIN1" ) ) )

                                                    title
1 José Francisco Salgado - Executive Director and Founder
2 José Francisco Salgado - Executive Director and Founder
Sign up to request clarification or add additional context in comments.

Comments

0

I had a similar issue. I was accessing an API which the returned the data with correct diacritics. However, using rawToChar() messed it up. So I used instead content() with the data received from the httr::GET.

Ex (I wanted the value for name):

raw_data <- httr::GET(API_URL)

content(raw_data)$name   

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.