Error with levels using mlogit in R

Question

I am having some trouble with levels... Running the following:

library(mlogit)

panel.datasm = data.frame(
    cbind( 
        round(runif(100, min=1, max=6)), 
        rep(1:20,each=5), runif(100, min=0, max=1), 
        runif(100, min=0, max=6), 
        runif(100, min=2, max=6) , 
        runif(100, min=0, max=1), 
        runif(100, min=0, max=6), 
        runif(100, min=2, max=6)  ))
names(panel.datasm) = c("choice", "id", "data_1991","data_1992",
  "data_1993", "data2_1991", "data2_1992","data2_1993") 


logit.data <- mlogit.data(panel.datasm, id = "id", choice = "choice", 
    varying= 3:5, shape = "wide", sep = "_")

Keep getting the error Error in Ops.factor(data[[choice]], alt) : level sets of factors are different

I have also tried assigning levels manually:

panel.datasm$id= factor(
    panel.datasm$id, 
    levels = sort(as.character(unique(panel.datasm$id)))  )

I have tried a number of things and can't figure out what is going wrong. For comparison take a look at :

data("Electricity", package = "mlogit")
head(Electricity)
Electr <- mlogit.data(Electricity, id = "id", choice = "choice", 
    varying = 3:26, shape = "wide", sep = "")

Which as far as I can tell is identical to my data format. What's going on here? I'm at my whit's end.

I have never been able to get the automatic reshape of mlogit to work. As a result, I have resorted to manually reshaping my data to create the required long format. Good luck. — Andrie
– Andrie, Commented Nov 10, 2011 at 21:04
PS. Thanks for asking this question. I tried to understand mlogit soon after starting to learn R. I couldn't make head or tail of the code. As far as I can tell, the code works and is algorithmically correct, but from a user's point of view isn't particularly robust. Your question prompted me to research mlogit again. — Andrie
– Andrie, Commented Nov 10, 2011 at 21:46
Perhaps you also want to distinguish between data and data2 with varying= c(data=3:5, data2=6:8) — IRTFM
– IRTFM, Commented Nov 10, 2011 at 21:49

Andrie · Accepted Answer · 2011-11-10 21:31:44Z

2

I believe I have traced the problem. Your choice variables and your alternative variables should be the same.

If you change your the first column of your data.frame to have values between 1991:1993 it will work.

panel.datasm = data.frame(
    cbind( 
        sample(1991:1993, 100, replace=TRUE), 
        rep(1:20,each=5), runif(100, min=0, max=1), 
        runif(100, min=0, max=6), 
        runif(100, min=2, max=6) , 
        runif(100, min=0, max=1), 
        runif(100, min=0, max=6), 
        runif(100, min=2, max=6)  ))
names(panel.datasm) = c("choice", "id", "data_1991","data_1992",
    "data_1993", "data2_1991", "data2_1992","data2_1993") 


logit.data <- mlogit.data(panel.datasm, id = "id", choice = "choice", 
    varying= 3:5, shape = "wide", sep = "_")

The results:

head(logit.data)
       choice id  alt       data     data2 chid
1.1991  FALSE  1 1991 0.03540498 0.9726110    1
1.1992  FALSE  1 1992 5.85285278 2.7973798    1
1.1993   TRUE  1 1993 5.80795641 3.7360297    1
2.1991   TRUE  1 1991 0.59255235 0.2564928    2
2.1992  FALSE  1 1992 5.81443351 3.0820215    2
2.1993  FALSE  1 1993 2.11699854 5.4161634    2

If you now compare it with Electricity, the difference is obvious. Notice that the choices are 1:4, and each alternative ranges from 1 to 4.

head(Electricity)
  choice id pf1 pf2 pf3 pf4 cl1 cl2 cl3 cl4 loc1 loc2 loc3 loc4 wk1 wk2 wk3 wk4
1      4  1   7   9   0   0   5   1   0   5    0    1    0    0   1   0   0   1
2      3  1   7   9   0   0   0   5   1   5    0    0    1    0   1   1   0   0
3      4  1   9   7   0   0   5   1   0   0    0    0    0    1   0   1   1   0
4      4  1   0   9   7   0   1   1   0   5    0    0    1    0   1   0   0   1
5      1  1   0   9   0   7   0   1   0   5    1    0    0    0   0   1   0   1
6      4  1   0   9   0   7   0   0   1   5    0    0    1    0   0   0   0   1

answered Nov 10, 2011 at 21:31

Andrie

180k52 gold badges456 silver badges504 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

mmann1123 Over a year ago

Thanks guys. This was helpful. My dataset is weird in that variables don't vary by choice. But this clarified what was going on. I think it will work now!

James Over a year ago

note that the varying argument in the first example should be 3:8, not 3:5

Ramnath · Accepted Answer · 2011-11-10 21:17:25Z

0

The problem is that the row.names created by reshape are not unique and that is causing trouble. Here is a quick fix. You need to add a chid.var that would be unique for each row. I have used the index function from zoo to do that. You can use other ways as well I suppose.

mlogit.data(panel.datasm, choice = 'choice', id = 'id', shape = 'wide', 
 varying = 3:8, sep = "_", chid.var = 1:NROW(index))

        choice id  alt     data      data2
1.1991  FALSE  1 1991 0.4769187 0.97381645
1.1992  FALSE  1 1992 3.2998748 0.70989021
1.1993  FALSE  1 1993 5.6199917 5.53069555
2.1991  FALSE  1 1991 0.3615670 0.02066214
2.1992  FALSE  1 1992 2.0461820 0.41804600
2.1993  FALSE  1 1993 2.2764992 3.93337758

edited Nov 10, 2011 at 21:17

answered Nov 10, 2011 at 21:11

Ramnath

55.9k16 gold badges129 silver badges155 bronze badges

2 Comments

Andrie Over a year ago

This gets past the first hurdle, but I think will lead to spurious model results. Notice that the value of choice is now always FALSE, whereas it should be TRUE when the respondent choice matches that alternative (i.e. row in the data.frame).

Andrie Over a year ago

PS. I apologise that I deleted my first comment - that probably leads to confusion. I wrote a comment, then started to doubt whether I was correct. Then checked my assumptions and posted a new comment. Sorry.

EDennnis · Accepted Answer · 2018-01-22 16:20:51Z

The error comes from the reshape package. It is unable to determine the time element when converting the data.

The mlogit help guide ?mlogit.data provides the solution to this under the option "alt.levels" stating: "the name of the alternatives: if null, for a wide data.frame, they are guessed from the variable names and the choice variable (both should be the same)".

Since you are not giving the names of the alternatives reshape is guessing and cannot determine them. The fix then is to manually provide those names. Leaving the data as provided in the question you use the following:

logit.data <- mlogit.data(panel.datasm, id = "id", choice = "choice", 
                      varying= 3:8, shape = "wide", sep = "_",
                      alt.levels = c("data_1991","data_1992","data_1993", "data2_1991", "data2_1992", "data2_1993"))

*Note: As was mentioned by @James, you should vary from 3:8 NOT 3:5.

Collectives™ on Stack Overflow

Error with levels using mlogit in R

3 Answers 3

2 Comments

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related