How to remove empty string ("") in string

Question

I encountered a strange problem while web scraping using rvest.

I scraped the following name: "Abdichter/in EFZ" which at first looked normal. However, when I wrote the file to a csv I found "-" between the letters. In Excel, the word looked like this: Ab-dich-ter/in EFZ.

So I did a str_split(x, "") and found that the string actually looked like this:

c("A", "b", "", "d", "i", "c", "h", "", "t", "e", "r", "/", "i", "n", " ", "E", "F", "Z")

I tried to get the empty strings out of the string but I did not manage. I tried:

my_string <- str_split(my_string , "")

and then

paste0(my_string[my_string != ""])

but this did not help.

Therefore, I wonder:

How did the empty strings get into that string, and
how do I get it out again.

Edit: This is the webpage.

And here is how I got the string:

library(rvest)

read_html("https://berufskunde.com/ausbildungsberufe/ausbildung-abdichter.html", encoding = "UTF-8") %>% 
  html_nodes(".section") %>% 
  html_nodes(".text-rot") %>% 
  html_text()

I think your "" is different character. You may need v1[trimws(v1) != ""] Here 'v1' is the split character vector — akrun
– akrun, Commented Jul 17, 2019 at 13:38
One possible issue could be "" compared to " " (space inbetween the two quotes). For me, in many cases I need to use " " — meh
– meh, Commented Jul 17, 2019 at 13:41

Konrad Rudolph · Accepted Answer · 2019-07-17 13:50:23Z

5

The string you’re observing is not the empty string but a SOFT HYPHEN (U+00AD) character. It is supposed to be only displayed when a word is broken across lines, but some editors don’t cope with it correctly, which is why it’s probably shown when you inspect the CSV.

At any rate you probably want to remove it from your string:

str = gsub('\U00AD', '', str)

answered Jul 17, 2019 at 13:50

Konrad Rudolph

549k142 gold badges967 silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Konrad Rudolph Over a year ago

@Roccer If you know about the behaviour of this soft hyphen character, your description made it likely that this was the case. And since you posted a reproducible example (excellent!) it was easy to verify. For reference, it may also help to inspect the byte values of a string via charToRaw, but that only helps if you know what to look for.

Collectives™ on Stack Overflow

How to remove empty string ("") in string

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related