I have some character strings which I'm getting from an html. Turns out, these strings have some hidden characters or controls (?).
How can I convert this string so that it only contains the visible characters?
Take for example the term "Besucherüberblick" and its raw representation:
charToRaw("Besucherüberblick")
[1] 42 65 73 75 63 68 65 72 c3 bc 62 65 72 62 6c 69 63 6b
However, from my html, I'm getting:
[1] e2 80 8c 42 65 73 75 63 68 65 72 c3 bc 62 65 72 62 6c 69 63 6b
So there are these three weird thingies at the beginning.
I could probably trial and error and manually remove these from my raw vector and then convert it back to character, but a) I don't know in advance which strings the html will give me and b) I'm looking for an automated solution.
Maybe there's some stringr/stringi solution to it?