I'm in trouble with a dataset provided by the Brazilian government (therefore, it is in portuguese). Here's the code that imports it:
library(tidyverse)
locais_vot_SP <- read_delim("https://raw.githubusercontent.com/camilagonc/votacao_secao/master/locais_vot_SP.csv",
locale = locale(encoding = "ISO-8859-1"),
delim = ",",
col_names = F) %>%
filter(X4 == "VINHEDO")
names(locais_vot_SP) <- c("num_zona",
"nome_local",
"endereco",
"nome_municipio",
"secoes",
"secoes_esp")
As it can be noticed, the values of the variable secoes are not properly organized, since different data are aggregated in the same cell.
secoes
196ª; 207ª; 221ª; 231ª;
197ª; 211ª; 230ª; 249ª;
With the following code, I started to fix the problem:
locais_vot_SP <- locais_vot_SP %>% mutate(secoes = gsub("ª", "", secoes)) %>%
mutate(secoes_esp = gsub("ª", "", secoes_esp)) %>%
mutate(secoes_esp = gsub(";", "", secoes_esp)) %>%
mutate(secoes = gsub("Da ", "", secoes)) %>%
separate_rows(secoes, sep = ";") %>%
mutate(secoes = unlist(strsplit(locais_vot_SP$secoes, ";")))
And so I got to this:
secoes
32 à 38
100
121
What still needs to be solved are the cases in which there is x à y (in English, x to y). How can I get the following output?
secoes
32
33
34
35
36
37
38
...