0

I have the following dataframe which contains several values for a single variable (Problemas.habituales) (see below)

> read.csv("http://pastebin.com/raw.php?i=gnWRqJnY")
  Nombre.barrio                             Problemas.habituales
1         Actur Robos con violencia, Agresiones, Otros problemas
2         Actur                                  Ningún problema
3        Centro                  Robos con violencia, Agresiones
4     San Pablo                                  Ningún problema
5     San Pablo                                  Ningún problema
6      Delicias                     Hurtos o robos sin violencia

The reason for this structure is that I created an online questionnaire which accepts multiple answers to the same question, but the way data is stored is a problem because there's no way to create a barplot displaying all common problems within every neighborhood without previously manipulating the dataframe.

Unfortunately I do not know how to manipulate the dataframe (I need it to be on a data frame since I need to use ggplot2 later on, which does not accept data tables) in a way that every row contains a single value for the variable "Problemas.habituales".

2
  • I've seen that this question has been marked with a -1 and I am wondering why, since I made a search first on duckduckgo and later on in stackoverflow and didn't find any duplicate (other than being easy to solve if you know how to do it, but I don't think being a newbie is something bad). Commented Jun 16, 2015 at 8:49
  • 1
    check this should be helpful Commented Jun 16, 2015 at 10:23

2 Answers 2

3
library(data.table)
DF <- fread("http://pastebin.com/raw.php?i=gnWRqJnY")
setnames(DF, make.names(names(DF)))
DF <- DF[, .(Problemas.habituales = unlist(strsplit(Problemas.habituales, ",", 
                                                    fixed = TRUE))), by = Nombre.barrio]
setDF(DF)

(I assume that you don't see encoding problems with your locale.)

Sign up to request clarification or add additional context in comments.

5 Comments

Hum... aparently doesn't work if I used read.csv instead of fread... still wondering why and how to fix it, since if I change to fread it will break most of the work I've done in other parts of the dataframe due to different column names (read.csv adds . instead of spaces between words)
I think I found the problem... fread creates a data table, whereas read.csv creates a data frame, which is what I need. Is there any way to make it work with data frames?
Sure, but why would you want to?
As far as I know, ggplot2 only works with dataframes, not tables, and I need to work with ggplot2.
1) data.table inherits the data.frame class and ggplot2 works with data.tables just fine. 2) The last command turns the data.table into an ordinary data.frame.
2

you can do this using splitstackshape

library(splitstackshape)
cSplit(DF, "Problemas habituales", ",", direction = "long")

#   Nombre barrio         Problemas habituales
#1:         Actur          Robos con violencia
#2:         Actur                   Agresiones
#3:         Actur              Otros problemas
#4:         Actur              Ningún problema
#5:        Centro          Robos con violencia
#6:        Centro                   Agresiones
#7:     San Pablo              Ningún problema
#8:     San Pablo              Ningún problema
#9:      Delicias Hurtos o robos sin violencia

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.