Psychology Ph.D. Dissertations


Bringing Situational Judgement Tests to the 21st Century: Scoring of Situational Judgement Tests Using Item Response Theory

Date of Award


Document Type


Degree Name

Doctor of Philosophy (Ph.D.)



First Advisor

Michael Zickar (Advisor)

Second Advisor

Hyeyoung Bang (Other)

Third Advisor

Joshua Grubbs (Committee Member)

Fourth Advisor

Samuel McAbee (Committee Member)


Situational judgement tests (SJTs) became popular selection instruments in the last three decades, due to their predictive validity, small subgroup differences, and high face validity. However, although SJTs have made a significant progress in the last century, there still remains a construct problem – it is not sure whether SJTs are a construct or a measurement method. In addition, almost in parallel to the advancement of SJTs, a new theory for scoring and testing has been developed – item response theory (IRT). IRT offers researchers and practitioners flexible models that fit various types of data and can be used to score tests and questionnaires and to learn about their psychometric qualities. In addition, some IRT models offer us a unique method to score multidimensional tests, which assess more than one construct. This study attempts to apply different IRT models to a leadership SJT in order to answer two main questions: one, is SJT a construct or a measurement method? And two, can IRT-based scoring benefit us in terms of validity and reducing subgroup differences over the classical scoring approaches? These questions were tested on three samples of Israeli soldiers who went through a selection process for officers’ training school and had to take a leadership SJT as part of it.

The results of this study suggest that the picture is more complicated than it was originally thought. It appears that IRT has value over classical test theory (CTT) only for some samples, whereas CTT has more value in other samples. In regard to the construct vs. measurement method debate, it appears that multidimensional IRT models better fit the SJT that was used in this study, a testimony that sides with the SJT as a measurement method camp. Future research and limitations are discussed at the end of the manuscript.