R Tutorials‎ > ‎

Ordinal Logit/Probit Models in R

posted Feb 28, 2016, 9:43 AM by Mia Costa   [ updated Mar 14, 2016, 10:53 AM ]
An ordinal logit model is used when you have a dependent variable that takes on more than two categories and those categories are ordered. For example, likert scales, common in survey research, are an example of ordered dependent variables. 

One good way to estimate an ordinal logit model in R is the polr function from the MASS package. The function name comes from proportional odds logistic regression.

In the 2008 CCES dataset (http://people.umass.edu/schaffne/cces08.dta), there is a question asking respondents their views on Affirmative Action policies (cc313). Respondents can indicate their support for such policies on a four point scale, ranging from 1 (“Strongly support”) to 4 (“Strongly oppose”). One could collapse these categories into “support” or “oppose,” but why throw away detail about the strength of support?

Since I read in the Stata .dta file from the last chapter without converting to factors, I have to convert the dependent variable back into ordered levels. I could also have just left that convert.factors=FALSE argument out when I read in the Stata file. It's always a good idea anyway to first check the structure of the variable: 

# output tells us "FALSE": cc313 is not a factor variable
# convert to factor 
dat$cc313 <- factor(dat$cc313)

Once the variable is converted into a factor, estimate an ordinal logit model on this question using the following commands:

ologmodel <- polr(cc313 ~ ideology + pid + education, data=dat, Hess=TRUE)

We also specify Hess=TRUE to have the model return the ‘Hessian matrix’ which is used to get standard errors. The coefficients in this output are estimating the effect of a one unit increase in each independent variable on the log odds of moving to a higher category on the scale of the dependent variable. These can be converted to odds ratios using the same method noted above: 
exp(cbind(coef(ologmodel), confint(ologmodel)))

Creating predicted probabilities follows the same intuition above, where “new data” gets specified with the conditions for the other variables, and the model gets plugged into the predict command. The only difference is that the type argument changes to "probs" for the ordinal logit.

Here, I just get predicted probabilities for cc313 for every observation (aka respondents aka rows) in the dataset. I print out the first 7 observations in the dataset so we can see what they look like. The columns are the probabilities that that respondent has that value (1-5) for cc313, and the rows are the observations.

 # predicted probs 
PPolog <- predict(ologmodel1, type = "probs")

# print out first 7 observations 
head(PPolog, n=7)

This post is CHAPTER 2 of an R packet I am creating for a Survey Methods graduate seminar taught by Brian Schaffner at the University of Massachusetts Amherst. The seminar is usually only taught in Stata, so I translated all the exercises, assignments, and examples used in the course for R users. Other chapters include: Logit/Probit Models, Multinomial Logit Models, Count Models, Using Weights, Creating Post-Stratification Weights, Item Scaling, Matching/Balancing.