2

I get a strange encoding problem when I try to parse a certain attribute of an xml/html document. Here a reproducible example , containing 2 items with 2 titles (note the use of french accent here)

library(XML)
doc <- htmlParse('<note>
              <item title="é">1</item>
              <item title="ï">3</item>
          </note>',asText=TRUE,encoding='UTF-8')

Now using xpathApply , I can read my items like this. Note that special accents are well formatted here.

xpathApply(doc,'//item')

[[1]]
<item title="é">1</item> 

[[2]]
<item title="ï">3</item> 

But When I try to read my attribute title , I get this :

xpathApply(doc,'//item',xmlGetAttr,'title')
[[1]]
[1] "é"

[[2]]
[1] "ï"

I tried other xpath versions like :

  xpathApply(doc,'//item/@title') 
  xmlAttrs(xpathApply(doc,'//item')[[1]])

But this doesn't work. Any help please?

3
  • This works fine for me. R 3.0.0 i686-pc-linux-gnu Commented May 15, 2013 at 10:07
  • 1
    On windows this error is reproducible. Commented May 15, 2013 at 10:14
  • The strings “é” and “ï” are the UTF-8 encoded representations of “é” and “ï” when misinterpreted as ISO 8859-1 or windows-1252 encoded data. Commented May 15, 2013 at 10:14

1 Answer 1

2

Its not pretty and I cant test on this linux machine but try:

  xpathApply(doc,'//item',
         function(x) iconv(xmlAttrs(x,'title'), "UTF-8", "UTF-8"))
[[1]]
title 
  "é" 

[[2]]
title 
  "ï" 

xmlAttrs calls RS_XML_xmlNodeAttributes examining this code there appears to be no facility for handling encoding. xmlValue calls R_xmlNodeValue this has encoding added. Looking at ?xmlValue we have encoding: experimental functionality and parameter related to encoding. Maybe encoding on the attributes will be added at a later date.

Sign up to request clarification or add additional context in comments.

2 Comments

from @Jukka K. Korpela comment it might help
thanks! I test it and it works. I just simplify your answer. No need to use lapply here.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.