For more than a century, Psychologists have struggled to make their discipline a ‘proper science’. From introspection, to behaviorism and then to cognitivism, Psychology has fallen somewhat awkwardly between the biological and social sciences. Suffering existential doubt, and always looking over their shoulders, Psychologists never quite found a place of comfort at the high table of Science. Contributing to this liminal status have been three issues, measurement, theory, and paradigm. In this article, I discuss measurement.
Attributes of the physical world are measured quantitatively. Attributes of the psychological world are more ‘sticky’ to deal with. For good reason, psychologists are unable to measure many of the most interesting psychological attributes in any direct and objective manner. Unfortunately, measurement in Psychology is an ‘Emperor’s clothes’ story. The early years as an infant science were spent paddling at the shallow end of the pool with attempts to make psychophysics and ability testing the showcases of a new quantitative science. But it was all downhill from there on. In spite of limited successes, the ‘measurement problem’ has never been satisfactorily resolved. S.S. Stevens’ Handbook of Experimental Psychology (1951) invoked ‘operationism’ as a potential solution and, since that time, Psychologists have assumed as an act of faith that measurement is the assignment of numbers to attributes according to rules. Sadly, Stevens’ solution is a mass delusion, a sleight of mind.
Michell (1997) summarized the situation thus: “…establishing quantitative science involves two research tasks: the scientific one of showing that the relevant attribute is quantitative; and the instrumental one of constructing procedures for numerically estimating magnitudes. From Fechner onwards, the dominant tradition in quantitative Psychology ignored this task. Stevens’ definition rationalized this neglect. The widespread acceptance of this definition within Psychology made this neglect systemic, with the consequence that the implications of contemporary research in measurement theory for undertaking the scientific task are not appreciated…when the ideological support structures of a science sustain serious blind spots like this, then that science is in the grip of some kind of thought disorder.” (Michell, 1997). A ‘kind of thought disorder’ – strong terms but it is true.
It is apparent that numbers can be readily allocated to attributes using a non-random rule (the operational definition of measurement) that would generate ‘measurements’ that are not quantitatively meaningful. For example, numerals can be allocated to colours: red = 1, blue = 2, green = 3, etc. The rule used to allocate the numbers is clearly not random, and the allocation therefore counts as measurement, according to Stevens. However, it would be patent nonsense to assert that ‘green is 3 × red’ or that ‘blue is 2 × red’, or that ‘green minus blue equals red’. Intervals and ratios cannot be inferred from a simple ordering of scores along a scale. Yet this is how psychological measurement is usually carried out.
Stevens’ oxymoronic approach aimed to circumvent the requirement that only quantitative attributes can be measured in spite of the self-evident fact that psychological constructs such as subjective well-being are nothing like physical variables (Michell, 1999, Measurement in Psychology). However, positivist psychometricians blithely treat qualitative psychological constructs as if they are quantitative in nature and as amenable to measurement as physical characteristics without ever demonstrating so. For more than 60 years many psychologists have lived in a make-believe world where ‘measurement’ consists of numbers allocated to stimuli on ordinal or Likert-type scales. This feature alone cuts off at its roots the claim that Psychology is a quantitative science on a par with the natural sciences.
Measurement can be defined as the estimation of the magnitude of a quantitative attribute relative to a unit (Michell, 2003). Before quantification can happen, it is first necessary to obtain evidence that the relevant attribute is quantitative in structure. This has rarely, if ever, been carried out in Psychology. Unfortunately, it is arguably the case that the definition of measurement within Psychology since Stevens’ (1951) operationism is incorrect and Psychologists’ claims about being able to measure psychological attributes can be questioned (Michell, 1999, 2002). Contrary to common beliefs within the discipline, psychological attributes may not actually be quantitative at all, and hence not amenable to coherent numerical measurement and statistical analyses that make unwarranted assumptions about the numbers collected as data.
Psychometricians often make the precarious assumption that ordinal scales constitute a valid description of underlying quantitative attributes, that psychological attributes are measurable on interval scales. Otherwise there can be no basis for quantitative measurement across large domains of the discipline. Michell (2012) argued that: “the most plausible hypothesis is that the kinds of attributes psychometricians aspire to measure are merely ordinal attributes with impure differences of degree, a feature logically incompatible with quantitative structure. If so, psychometrics is built upon a myth” (p. 255). This view is supported by Sijtsma (2012) who argues that the real measurement problem in Psychology is the absence of well-developed theories about psychological attributes and a lack of any evidence to support the assumption that psychological attributes are continuous and quantitative in nature.
Using ordinal data as if they are interval or ratio scale measures can leads to incorrect inferences and conclusions. Using totals and averages requires the assumption that data are measured on an interval scale. Performing parametric analyses on ordinal data can produce biased estimates of variances, covariances, and correlations and also spurious interaction effects.
Here I use measures that preserve the requirements of a ratio scale, namely, that there are meaningful ratios between measurements. For example, if you have a cold and took three paracetamol tablets today and four yesterday, you could say that the frequency today was ¾ or .75 of what it was yesterday. Measuring objects by using a known scale and comparing the measurements works well for properties for which scales of measurement exist. Thurstone (1927) used the method of pair comparisons to derive scale values for any set of stimulus objects with the Law of Comparative Judgement which states:
In his ‘Analytic Hierarchy Process’, Saaty (2008) also uses direct comparisons between pairs of objects to establish measurements for intangible properties that have no scales of measurement. The value derived for each element depends on what other elements are in the set. Relative scales are derived by making pairwise comparisons using numerical judgments from an absolute scale of numbers (e.g. 0-9). Measurements to represent comparisons define a cardinal scale of absolute numbers that is stronger than a ratio scale.
Intuitive measurement is something that we take for granted, but the way it is achieved may be far from intuitive. Consider how we are able estimate and compare magnitudes of objects, even when we have never actually seen these objects. For example, how do we compare the sizes of animals such as lions and hippos and judge which is larger or which is smaller? One theory of this process that appears to be especially accurate is described below.
Estimating and comparing magnitudes
One theory of the estimation and comparison of magnitudes assumes there are implicit minimal and maximal reference points at the extreme ends of the distribution. As a special case of the Law of Comparative Judgement, the theory assumes that stimulus objects are represented by distributions with variances that increase with distance from the reference point contained in the question (Marks, 1972).
This photo from 1969 shows the author and ‘subject’ with the basic apparatus and stimuli from Experiments 7 and 8 of the author’s doctoral research at Sheffield University, ‘An Investigation of Subjective Probability Judgements’.
More than 40 years later, in 2014, Reference Point Theory received empirical support from a team at UCLA under the leadership of Keith Holyoak. In Cognitive Psychology, Chen, Lu and Holyoak (2014) present a model of how magnitudes can be acquired and compared based on BARTlet, a simpler version of ‘Bayesian Analogy with Relational Transformations’ (BART, Lu, Chen, & Holyoak, 2012). The authors concluded that Reference Point Theory provided the best fit to their data:
“BARTlet provides a computational realization of a qualitative hypothesis proposed four decades ago by Marks (1972)…The reference-point hypothesis implies that the congruity effect results from differences in the discriminability of magnitudes represented in working memory, rather than a bias in encoding (e.g., Marschark & Paivio, 1979) or a linguistic influence (Banks et al., 1975). BARTlet provides a well-specified mechanism by which reference points can alter discriminability in direct judgments of discriminability (Holyoak & Mah, 1982) as well as speeded tasks (p. 46).”
As well as being a professor of psychology at UCLA, Keith Holyoak is also a poet and translator of classical Chinese poetry. Kudos!