R Tutorials‎ > ‎

Multinomial Logit Models in R

posted Mar 14, 2016, 10:18 AM by Mia Costa   [ updated Oct 21, 2017, 4:02 PM ]
For cases when the dependent variable has unordered categories, we use multinomial logit. For example, variable cc309 in the 2008 CESS (http://people.umass.edu/schaffne/cces08.dta) asks respondents if they would rather cut defense spending, cut domestic spending, or raise taxes.

As usual in R, there are a few package options from which to choose to carry out a multinomial logit regression. I’m going to use the nnet package and the multinom function. First we need some manipulating. Recode the variables with some more intuitive labels, convert it into a factor, and let’s make “cut domestic spending” the baseline variable.

dat$cc309[dat$cc309=="1"] <- "cut defense" 
dat$cc309[dat$cc309=="2"] <- "cut domestic" 
dat$cc309[dat$cc309=="3"] <- "raise taxes" 

# convert to factor 
dat$cc309 <- factor(dat$cc309) 

# make cut domestic the baseline variable 
dat$cc309 <- relevel(dat$cc309, ref="cut domestic")

Then, we run the model and get a summary of the results: 

mnmodel <- multinom(cc309 ~ ideology + pid + education, data = dat) 

Each coefficient and statistical test above is calculated in relation to the base category (cut domestic spending). So, the coefficients are telling you whether a coefficient means there is a significant difference between the odds of choosing the category relative to the base category, but not whether there is a significant difference across other categories.

As with the logit model, you can again create predicted probabilities and plots to help look at the size of the effects. However, since you now have more than two categories, the effect for any given variable might be different for different categories of the dependent variable.

# recreate dataframe with ideology at all levels and other variables at means 
fdat <- with(dat, data.frame(ideology = rep(c(1:5), 1), pid=mean(pid, na.rm=TRUE), education=mean(education)))

# get predicted probabilities 
predict(mnmodel, newdata = fdat, type = "probs")
We can’t just save these predictions as their own variable like we did in Ch 1 (Logit) & Ch 2 (Ordinal Logit), because there are different predicted probabilities for each category of the dependent variable. The following code separates each outcome into it’s own variable, stored into a new dataframe with ideology.

# make dataframe of just the ideology levels
ideodat <- with(dat, data.frame(ideology = rep(c(1:5), 1)))
# combine the new dataframe and predicted probabilities
pp309 <- cbind(ideodat, predict(mnmodel, newdata = fdat, type = "probs", se = TRUE))

# check it out to make sure

Now we plot. This time let’s use R’s built-in generic plotting function. ggplot can be used again, alternatively. Note that I tell R not to erase the previous plot so I can overlay the effects of ideology for each of the three outcomes:

plot(pp309$ideology, pp309quot;cut defense", xlab="Ideology", ylab="Probability", col="blue", type="l", ylim=c(0.0,1.0))
plot(pp309$ideology, pp309quot;cut domestic", xlab=’’, ylab=’’, axes=F, col="red", type="l", ylim=c(0.0,1.0))
plot(pp309$ideology, pp309quot;raise taxes", xlab=’’, ylab=’’, axes=F, col=" darkgreen", ylim=c(0.0,1.0), type="l")
legend(1, 1, legend=c("Cut Domestic Spending","Cut Defense Spending", "Raise Taxes"), fill=c("blue","red", "darkgreen"), cex=.7, bty="n")

The graphic shows that most of the effect of ideology occurs in choosing between the “cut defense” and “cut domestic” categories. Almost everyone, regardless of ideology, prefers to not raise taxes.

Next post --> Count Models in R