3

I am having some trouble with levels... Running the following:

library(mlogit)

panel.datasm = data.frame(
    cbind( 
        round(runif(100, min=1, max=6)), 
        rep(1:20,each=5), runif(100, min=0, max=1), 
        runif(100, min=0, max=6), 
        runif(100, min=2, max=6) , 
        runif(100, min=0, max=1), 
        runif(100, min=0, max=6), 
        runif(100, min=2, max=6)  ))
names(panel.datasm) = c("choice", "id", "data_1991","data_1992",
  "data_1993", "data2_1991", "data2_1992","data2_1993") 


logit.data <- mlogit.data(panel.datasm, id = "id", choice = "choice", 
    varying= 3:5, shape = "wide", sep = "_")

Keep getting the error Error in Ops.factor(data[[choice]], alt) : level sets of factors are different

I have also tried assigning levels manually:

panel.datasm$id= factor(
    panel.datasm$id, 
    levels = sort(as.character(unique(panel.datasm$id)))  )

I have tried a number of things and can't figure out what is going wrong. For comparison take a look at :

data("Electricity", package = "mlogit")
head(Electricity)
Electr <- mlogit.data(Electricity, id = "id", choice = "choice", 
    varying = 3:26, shape = "wide", sep = "")

Which as far as I can tell is identical to my data format. What's going on here? I'm at my whit's end.

4
  • 1
    I have never been able to get the automatic reshape of mlogit to work. As a result, I have resorted to manually reshaping my data to create the required long format. Good luck. Commented Nov 10, 2011 at 21:04
  • PS. Thanks for asking this question. I tried to understand mlogit soon after starting to learn R. I couldn't make head or tail of the code. As far as I can tell, the code works and is algorithmically correct, but from a user's point of view isn't particularly robust. Your question prompted me to research mlogit again. Commented Nov 10, 2011 at 21:46
  • Perhaps you also want to distinguish between data and data2 with varying= c(data=3:5, data2=6:8) Commented Nov 10, 2011 at 21:49
  • Thanks for all the help. I will try this all out now! Commented Nov 10, 2011 at 22:09

3 Answers 3

2

I believe I have traced the problem. Your choice variables and your alternative variables should be the same.

If you change your the first column of your data.frame to have values between 1991:1993 it will work.

panel.datasm = data.frame(
    cbind( 
        sample(1991:1993, 100, replace=TRUE), 
        rep(1:20,each=5), runif(100, min=0, max=1), 
        runif(100, min=0, max=6), 
        runif(100, min=2, max=6) , 
        runif(100, min=0, max=1), 
        runif(100, min=0, max=6), 
        runif(100, min=2, max=6)  ))
names(panel.datasm) = c("choice", "id", "data_1991","data_1992",
    "data_1993", "data2_1991", "data2_1992","data2_1993") 


logit.data <- mlogit.data(panel.datasm, id = "id", choice = "choice", 
    varying= 3:5, shape = "wide", sep = "_") 

The results:

head(logit.data)
       choice id  alt       data     data2 chid
1.1991  FALSE  1 1991 0.03540498 0.9726110    1
1.1992  FALSE  1 1992 5.85285278 2.7973798    1
1.1993   TRUE  1 1993 5.80795641 3.7360297    1
2.1991   TRUE  1 1991 0.59255235 0.2564928    2
2.1992  FALSE  1 1992 5.81443351 3.0820215    2
2.1993  FALSE  1 1993 2.11699854 5.4161634    2

If you now compare it with Electricity, the difference is obvious. Notice that the choices are 1:4, and each alternative ranges from 1 to 4.

head(Electricity)
  choice id pf1 pf2 pf3 pf4 cl1 cl2 cl3 cl4 loc1 loc2 loc3 loc4 wk1 wk2 wk3 wk4
1      4  1   7   9   0   0   5   1   0   5    0    1    0    0   1   0   0   1
2      3  1   7   9   0   0   0   5   1   5    0    0    1    0   1   1   0   0
3      4  1   9   7   0   0   5   1   0   0    0    0    0    1   0   1   1   0
4      4  1   0   9   7   0   1   1   0   5    0    0    1    0   1   0   0   1
5      1  1   0   9   0   7   0   1   0   5    1    0    0    0   0   1   0   1
6      4  1   0   9   0   7   0   0   1   5    0    0    1    0   0   0   0   1
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks guys. This was helpful. My dataset is weird in that variables don't vary by choice. But this clarified what was going on. I think it will work now!
note that the varying argument in the first example should be 3:8, not 3:5
0

The problem is that the row.names created by reshape are not unique and that is causing trouble. Here is a quick fix. You need to add a chid.var that would be unique for each row. I have used the index function from zoo to do that. You can use other ways as well I suppose.

mlogit.data(panel.datasm, choice = 'choice', id = 'id', shape = 'wide', 
 varying = 3:8, sep = "_", chid.var = 1:NROW(index))

        choice id  alt     data      data2
1.1991  FALSE  1 1991 0.4769187 0.97381645
1.1992  FALSE  1 1992 3.2998748 0.70989021
1.1993  FALSE  1 1993 5.6199917 5.53069555
2.1991  FALSE  1 1991 0.3615670 0.02066214
2.1992  FALSE  1 1992 2.0461820 0.41804600
2.1993  FALSE  1 1993 2.2764992 3.93337758

2 Comments

This gets past the first hurdle, but I think will lead to spurious model results. Notice that the value of choice is now always FALSE, whereas it should be TRUE when the respondent choice matches that alternative (i.e. row in the data.frame).
PS. I apologise that I deleted my first comment - that probably leads to confusion. I wrote a comment, then started to doubt whether I was correct. Then checked my assumptions and posted a new comment. Sorry.
0

The error comes from the reshape package. It is unable to determine the time element when converting the data.

The mlogit help guide ?mlogit.data provides the solution to this under the option "alt.levels" stating: "the name of the alternatives: if null, for a wide data.frame, they are guessed from the variable names and the choice variable (both should be the same)".

Since you are not giving the names of the alternatives reshape is guessing and cannot determine them. The fix then is to manually provide those names. Leaving the data as provided in the question you use the following:

logit.data <- mlogit.data(panel.datasm, id = "id", choice = "choice", 
                      varying= 3:8, shape = "wide", sep = "_",
                      alt.levels = c("data_1991","data_1992","data_1993", "data2_1991", "data2_1992", "data2_1993"))

*Note: As was mentioned by @James, you should vary from 3:8 NOT 3:5.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.