# Paleobiology

- The Paleontological Society

## Abstract

Recently, there has been much interest in detecting and measuring patterns of change in disparity. Although most studies have used one or two measures of disparity to quantify and characterize the occupation of morphospace, multiple measures may be necessary to fully detect changes in patterns of morphospace occupation. Also, the ability to detect morphological trends and occupation patterns within morphospace depends on using the appropriate measure(s) of disparity. In this study, seven measures were used to determine and characterize sensitivity to sample size of the data, number of morphological characters, percentage of missing data, and changes in morphospace occupation pattern. These consist of five distance measures—sum of univariate variances, total range, mean distance, principal coordinate analysis volume, average pairwise dissimilarity—and two non-distance measures—participation ratio and number of unique pairwise character combinations. Evaluation of each measure with respect to sensitivity to sample size, number of morphological characters, and percentage of missing data was accomplished by using both simulated and Ordovician crinoid data. For simulated data, each measure of disparity was evaluated for its response to changes of morphospace occupation pattern, and with respect to simulated random and nonrandom extinction events. Changes in disparity were also measured within the Crinoidea across the Permian extinction event.

Although all measures vary in sensitivity with respect to species sample size, number of morphological characters, and percentage of missing data, the non-distance measures overall produce the lowest estimates of variance (in bootstrap analyses). The non-distance measures appear to be relatively insensitive to changes in morphospace occupation pattern. All measures, except average pairwise dissimilarity, detect changes in occupation pattern in simulated nonrandom extinction events, but all measures, except number of unique pairwise character combinations and principal coordinate analysis volume, are relatively insensitive to changes in pattern in simulated random extinction events. The distance measures report similar changes in disparity over the Permian extinction event, whereas the non-distance measures differ. This study suggests that each measure of disparity is designed for different purposes, and that by using a combination of techniques a clearer picture of disparity should emerge.

## INTRODUCTION

Recently, there has been much interest in detecting and measuring patterns of change in disparity, or the degree of morphological differentiation among taxa within groups (Foote 1993, 1994, 1995, 1996, 1999; Wagner 1995, 1997; Smith and Bunje 1999; Eble 2000; Thomas et al. 2000). For example, measurement of disparity has been central in investigating the pattern of decreasing morphological innovation, which results primarily from developmental processes and ecological interactions among taxa over geological time (Valentine and Erwin 1985; Valentine 1986; Gould 1989; Erwin 1994). Measures of disparity have also been used to determine the role that morphological constraints play in determining the interaction between morphological diversification versus taxonomic diversification (Foote 1993, 1994, 1995, 1996, 1997, 1999; Wagner 1995, 1997; Smith and Bunje 1999; Eble 2000). More generally, disparity measures can be used to characterize changes in pattern of morphospace occupation and trends in disparity.

Detecting morphological trends and morphospace occupation patterns (expansion, bifurcation, or change of shape of the distribution of taxa within a morphospace) requires an appropriate measure of disparity. For example, if one were interested in how selected morphologies change over a given temporal sequence in the fossil record (i.e., trying to detect anagenesis versus cladogenesis within a given taxonomic unit), during a significant geological event (i.e., the splitting or joining of land masses), or throughout a mass extinction event, the ideal measure of disparity would be able to detect small changes in morphospace occupation patterns yet would be robust with respect to sample size, number of characters used, and missing or inapplicable data.

Various measures of disparity, each with different properties and uses, have been proposed and used in studies involving morphological evolution and constraint, e.g., average pairwise character difference (Sneath and Sokal 1973; Foote 1995, 1999; Lupia 1999), total variance (sum of univariate variances) (Van Valen 1974; Smith and Bunje 1999), range of variation (Foote 1991; Smith and Lieberman 1999), number of unique character-state combinations (Thomas and Reif 1993; Foote 1995, 1999), and multiplication of variances or ranges (volume of a hyper-cuboid, or hyper-ellipsoid) (Wills et al. 1994). Both average pairwise dissimilarity and total variance have been the most commonly employed (Foote 1992, 1993, 1994, 1995; Lupia 1999; Eble 2000).

Although average pairwise dissimilarity and total variance may be robust with respect to sample size, they are fairly insensitive to changes in morphospace pattern. (For example, changes in mean and mode can occur without greatly affecting average dissimilarity or variance.) This type of dichotomy complicates the issue of choosing the measure(s) of disparity with which to analyze data. What would be desirable is a knowledge of the advantages and disadvantages of each measure of disparity.

What measure should be used if a study deals with small sample sizes after a mass extinction event? With too small a sample there is danger of not gaining a true sampling of morphospace, but efforts to acquire a large sample can be both time consuming and expensive, and a large sample might simply be impossible to obtain. (If diversity is actually low, then it may not be possible to include a large number of species even with complete sampling.) Another important aspect of disparity analysis deals with how the number of morphological characters selected or number of missing characters affects the total disparity measured. Is disparity dependent upon number of characters used, or upon missing or inapplicable character data, and how does this impact an accurate determination of morphospace occupancy? Furthermore, how sensitive each measure is to changes of morphospace occupation, i.e., expansion or contraction, bifurcation, or change of morphospace shape, should also be considered when undertaking a study of morphological disparity within a particular taxon. Finally, in temporal studies of disparity (Foote 1994, 1995, 1999; Lupia 1999; Smith and Bunje 1999; Eble 2000) the amount of variance, via bootstrap analysis, associated with each datum can determine whether changes of disparity can be resolved between time periods. Therefore it would seem reasonable to choose a method that yields the least amount of variance while providing the most accurate depiction of morphospace occupation.

In this study we examine sensitivity to sample size, number of morphological characters, percentage of missing data, and changes in morphospace occupation pattern. In doing so we use seven measures: sum of univariate variances, total range, mean distance, number of unique pairwise character combinations, principal coordinate analysis (PCO) volume, average pairwise dissimilarity, and participation ratio. In the first part of the study, we used simulated data to evaluate these disparity measures with respect to sample size, character number, and percentage of missing data. Simulated data provide a useful means by which to evaluate the performance of the metrics under known evolutionary details. In the second part of the study, we used empirical data to evaluate the disparity measures with respect to the same factors. For the sake of completeness, we report our findings for all analyses, including those in which the results are highly intuitive (and, perhaps to some, obvious), for example, the increase in disparity with sample size and with number of characters. Likewise, though it may seem obvious that missing data would have little effect if they are randomly distributed, a thorough analysis of each metric required that we investigate the relationship between missing data and disparity. The third part of the study, using simulated data, evaluates the response of each disparity measure to changes in pattern of morphospace occupation, i.e., the expansion, bifurcation, and change in shape of the occupied morphospace. The fourth part evaluates each measure of disparity with respect to random and nonrandom simulated extinction events and follows this analysis with a look at the change in disparity within the Crinoidea over the Permian extinction event. It should be noted that although attention has been paid to certain aspects of phylogeny (e.g., using a monophyletic group such as the Crinoidea in the empirical study, and starting the simulated data with a single “ancestor”), some aspects were ignored. For example, we make no distinction between homologous and homoplastic characters. Additionally, even though some have suggested separating homologous from homoplastic similarities (i.e., using patristic dissimilarity instead of phenetic dissimilarity) (Smith 1994), we believe that applying patristic dissimilarity renders studies of positions in morphospace potentially less informative, as patristic dissimilarity cannot allow for the recognition of interesting patterns of convergence. Our goal is to examine the behavior of total disparity, within a group over a given time span. Future studies may take a higher-resolution approach, distinguishing the various sources of morphological similarity and dissimilarity.

## METHODS

In this study, three types of data were used: (1) simulated morphological character data (generated via simulated evolution), (2) Foote's Ordovician crinoid data (Foote 1999), and (3) simulated morphospace occupation patterns.

#### Simulated Evolution

Random walk models were written using the MatLab 5.01 mathematical package and based in part on the work of Raup and Gould (Raup and Gould 1974; Gould et al. 1977; Raup 1977). To simulate evolution, a ten-dimensional random walk was employed, with each dimension of the walk corresponding to one character, and each character having six states. The walker, which represents a species, starts at some initial point. At each successive time step, one character is selected randomly (using what is tantamount to tossing a ten-sided die, each face having a one-in-ten probability of being selected). For the selected character, the random walk can either proceed one step up or down from its current position (i.e., current character state) or remain stationary. For each set of simulations the probability of moving up or down within a character is constant and set at the onset of the random walk.

Three types of random walk were used in the study. The first type of walk simulates ordered character evolution, in which character states change in an ordered fashion over a range of states numbered 0 to 5. For example, if the character state at a given time is 0, and the decision to move a step is selected, then the walk can only proceed up one step to character state 1; if the character state is 5, and the decision to move a step is selected, then the walk can only proceed down one step to character state 4. At each step of the walk there are both a constant probability of origination (speciation), i.e., each walker becomes two separate walkers, and a constant probability of extinction, i.e., a particular walker is discontinued. The walk is continued until the desired species number is reached. The second type of random walk simulates unordered character evolution. In this model, the rules are essentially the same as the ordered model except that jumps within character states are allowed and reflecting boundaries are removed. The third random walk model includes varying dimensionality via the acquisition of key innovations. In this model the walker starts with three characters, each with six states. In this walk, if a “key” is turned on (with constant probability at each step), three new characters become available, each with six character states. There are three keys available; each turns on three new characters. Not only does this model simulate unordered character evolution, it also yields a nested pattern of character acquisition. All three models represent limited-variation systems; i.e., there is a finite number of character states possible in each model. Once we had generated a pool of 100 species, using each type of walk, we performed tests of sensitivity to proportion of sampled species, character number, and percentage of missing character data.

#### Generation of Morphospace Occupation Patterns

Two-dimensional patterns of morphospace expansion, bifurcation, and shape change were created by generating random x,y coordinate number pairs (the x and y coordinates correspond to the character states of two discrete characters). To simulate expansions, we generated coordinates by changing the width of the random number distribution in equal increments for both the x and y coordinate of each ordered pair. A bifurcation pattern was created by causing an increasing bimodal distribution among the coordinate number pairs. The shape of the pattern was changed by increasing the width of the random number distribution in one coordinate, while subsequently decreasing the width of the random number distribution in the other coordinate.

#### Simulation of Random and Nonrandom Extinction Events

A two-dimensional cluster of 60 random coordinate number pairs was created to simulate an occupied morphospace. To simulate a random extinction event, seven data points were removed randomly in successive steps. To simulate a nonrandom extinction event, the seven data points farthest from the center of the entire cluster were removed successively in each time frame, simulating the extinction of extreme morphologies. Although there are numerous approaches that one can use when modeling an extinction, the above method was chosen for the sake of simplicity.

#### Empirical Data

Foote's crinoid data set was chosen because it (1) contains a large number of species, (2) spans a large temporal range, and (3) contains a large number of morphological characters (Foote 1999). The data set includes 1195 species sampled throughout the Phanerozoic (29 stratigraphic intervals) with a total of 90 ordered, unordered, or binary morphological characters. The data set represents 752 genera or nearly two-thirds of the approximately 1130 known genera. The morphological characters consist of 14 characters representing the pelma, 40 characters representing the dorsal cup region, 28 characters representing the arms and ambulacral system, and 8 characters representing the anal and tegminal regions.

To test sensitivity to proportion of sampled specimens, character number, and percentage of missing character data, we used genera from the three stratigraphic divisions within the Ordovician (Lower Ordovician, Ordovician-2, and Ordovician-3). The Ordovician was chosen primarily because it contains a large number of genera (111 species). Additionally, by using three consecutive intervals, we could keep large differences in morphological evolution to a minimum (in effect minimizing any large morphological jumps). To evaluate the effect of morphospace contraction and/or expansion on the seven disparity measures, we selected the temporal sequence of Early Permian, middle Permian, Late Permian, and Early Triassic. Occupied morphospace expanded during the Early–mid Permian and then contracted during the Late Permian and Early Triassic.

#### Sampling Strategies

Sensitivity analyses required a series of resamplings of the data with successive reductions in sample size, character size, and percentage of missing data. To do this, for each variable, we used a jackknife or rarefaction process (Foote 1992, 1995) to reduce the data set from initial values of species, characters, and 0% missing data. After each reduction, disparity was measured using each of the seven measures. For example, in evaluating the effect of sample size, the first data point (5% of total species present) was generated by selecting 5% of the species 50 times, with replacement, from the pool of 100 specimens, unless otherwise stated. The same procedure was used to evaluate the effect of decreasing number of characters. To calculate the percentage of missing data we used the following procedure: A column from the data matrix was chosen at random, and from this column, a data element was chosen at random and replaced with the average value for the entire column. This process was repeated until the specified percentage of missing data was reached. In each Ordovician crinoid sensitivity analysis, a rarefaction process, similar to the process used for simulated taxa, was performed on the common pool of 111 Ordovician species. For all analyses, error bars were calculated using bootstrap resampling (Efron 1982) of species within the common pool of specimens used in each analysis. Each error bar represents one standard deviation.

#### Measures of Disparity

A brief description of each measure of disparity is given in Table 1. A simpler description will aid in understanding what the participation ratio measures. Imagine a square 2-D morphospace that has been divided up into four discrete bins (2 × 2 matrix) in which four specimens can occupy any combination of the four bins. Consider the following three cases: (1) If only one bin is occupied by all four data points, then the value of the participation ratio is equal to one. (2) If each bin is occupied by one data point, then the participation ration is 4. (3) If one bin is occuped by two points, two bins are occupied by one point each, and one bin remains empty, then the value of the participation ratio is 8/3. For simplicity, the third example is worked out below. Note: D is equal to the participation ratio. In this manner the participation ratio measures the extent of localization or occupation of morphospace. The number of the bins, as well as the length of each side of the of the morphospace, is set at the beginning of the analysis, obviating the need to normalize by the number of total cells.

Sum of variances, mean pairwise distance, range, PCO volume, and average pairwise distance are based upon morphologic distance measurements, whereas participation ratio and number of pairwise character combinations are based upon criteria other than distance. Additionally, mean pairwise distance, PCO volume, and the average pairwise dissimilarity are normalized measures, in particular, normalized by dividing by number of possible pairwise combinations (mean pairwise distance and mean pairwise dissimilarity) and number of species (PCO volume).

## RESULTS

#### Sensitivity Analysis of Simulated Species Data

All analysis programs were constructed using the mathematical package MatLab 5.01. Five of the ten characters defining the morphology of each species were treated as ordered, the other five as unordered. This scheme was chosen to reflect the fact that many analyses of disparity using real specimens have mixed character types. Figures 1, 2, and 3, respectively, show the sensitivity of the seven measures of disparity to proportion of sampled species, number of characters, and percentage of missing data (respectively) among simulated species. Very similar results were also obtained using the unordered character model and the key-innovation evolutionary model.

In Figure 1, all seven measures are relatively insensitive to number of species above a sample size of 20%, although range and unique pairwise character combinations do increase slowly and monotonically as the number of species increases (Thomas and Reif 1993; Foote 1999). The simple derivative curves (Fig. 4) of the disparity curves in Figure 1 support this. Figure 2 shows the dependency of disparity on number of characters. With the exception of mean distance, all measures show a marked increase of disparity as number of characters increases. Range increases the most rapidly out of the seven measures, whereas sum of variances, PCO volume, average pairwise dissimilarity, and unique pairwise character combinations increase less rapidly. All except participation ratio increase monotonically. Although participation ratio does increase overall with increasing character number, it does not increase throughout its entire range.

Sensitivity of disparity to percentage of missing data is shown in Figure 3. All measures are relatively insensitive to an increasing percentage of missing data up to 25%. This value was chosen because data sets are typically more than 75% complete.

In Table 2, the sensitivity of each measure is represented as either a low (<7.5%), medium (<15%), or high (>15%) value of the coefficient of variation (CV) for each value calculated. The CV is calculated in the following manner. For example, if the number of species is 20, a bootstrap analysis of sample size 20 is repeated 50 times, and the resulting mean and standard deviation are used to compute the CV. It is readily apparent that both average pairwise dissimilarity and unique pairwise character combinations have consistently low values of variation for each value calculated.

#### Sensitivity Analysis of Crinoid Species Data

The response of the seven measures to increasing proportion of sampled species, number of characters, and percentage of missing data using the Ordovician crinoid data set is shown in Table 3. Unlike the modeled species data, not all measures stabilize at 20% sampled species. Whereas average pairwise dissimilarity and participation ratio remain constant above a sample size of 18%, sum of variances, PCO volume and mean distance remain somewhat constant above a sample size of 27%. Unique pairwise character combinations increases monotonically albeit very slowly above a sample size of 20%, as in the modeled data analysis. Simple derivative curves confirm that disparity remains fairly constant above a sample size of 27% in all measures. The dependence of disparity upon number of morphological characters differs from that of the simulated species data. Sum of variances, range, PCO volume, and mean distance do not continually increase. The reason may be that these measures are all based upon simple distance measurements, which should be unaffected by adding more dimensions. Participation ratio also appears to remain constant above 75 characters. Because this metric gives an indication of the fraction of morphospace occupied at a given level of resolution, adding characters probably has little effect when saturation is finally achieved. Both average pairwise dissimilarity and number of pairwise character combinations increase monotonically. This is not surprising, as both measures are based on the number of pairwise differences that exist between species; hence, as character number increases so does the possibility for new character combinations. Finally, it appears that participation ratio, average pairwise dissimilarity, and unique pairwise character combinations differ slightly from results of the modeled species data in that disparity gradually increases with an increasing percentage of missing data. Sum of variances, range, mean distance, and PCO volume behave in a similar manner to the modeled species data.

In Table 4, the value of the CV for each measurement is represented for crinoid species in the same manner as in Table 2. As in the simulation, both average pairwise dissimilarity and number of pairwise character combinations both have consistently low values of variation. However, unlike for the simulated species, all other disparity measurements for the crinoid species are generally associated with high values of variation.

#### Sensitivity Analysis of Morphospace Occupation Pattern of Simulated Species

The range of possible patterns of morphospace occupation is large and it would be impossible to investigate them all. Figure 5 shows the five commonly encountered (for various examples of morphospace occupation patterns, see Foote 1999: Fig. 22; Lupia 1999: Fig. 2; O'Keefe and Sander 1999: Fig. 4; Eble 2000: Fig. 3). Of these, the examples in Figure 5A (expansion), 5B (bifurcation), and 5C (shape change) were analyzed using the disparity metrics. The example given in Figure 5D is a fragmentation, and the example in Figure 5E is a coordinated movement of all points in morphospace, and thus disparity does not change. That is not to say that this is not an important trend; it is simply beyond the scope of this study to address this particular phenomenon.

Figure 6A shows the response of the measures to an expanding morphospace. For all of the measures except average pairwise dissimilarity and number of unique pairwise character combinations, disparity increases rather uniformly. This is to be expected given the fact that sum of variances, range, PCO volume, and mean pairwise distance are based on simple distance measures and on distances between points shown in Figure 5A. Participation ratio would also be expected to increase because progressively more bins are occupied as the expansion increases. The number of new pairwise character combinations would not appreciably increase, nor would the number of pairwise differences, as is reflected in the figure. Figure 6B shows the response of the measures to a bifurcating morphospace. The behavior of the distance measures, mean pairwise dissimilarity, and number of unique character combinations is similar to that of an expanding morphospace as seen in Figure 6A. During the bifurcation, participation ratio behaves quite differently. After an initial increase, due to an increase in the number of bins occupied as the points first separate, the measure remains fairly constant. Figure 6C shows the response of the measures to occupied morphospace, which changes shape over a given time interval. Unlike the other to examples, range, mean pairwise distance, and PCO volume steadily decrease. This is to be expected because the overall distance between points decreases. The sum of variances steadily increases, again a result that is expected, because the overall variance between points increases. The participation ratio decreases as the number of occupied bins decreases during the compression of the initial morphospace. The average pairwise dissimilarity decreases midway through the shape change as the number of differences between the points decreases. Finally, the number of unique pairwise combinations begins to increase midway through the shape change and then steadily decreases as the number of unique pairwise combinations decreases.

Figures 7 and 8 show simulated random and nonrandom extinction events. In Figure 7A the extinction event proceeds at random with respect to character combinations. In Figure 7B the extinction event targets character combinations that occur at the periphery of the main cluster of character combinations. Figure 8A shows the results of applying the disparity metrics to the simulated random extinction event, during which both participation ratio and number of unique pairwise character combinations show a marked decrease as standing diversity decreases. PCO volume, conversely, shows a dramatic increase in disparity. The remaining measures are fairly constant over the range in which the extinction progresses. These results contrast greatly with those of the nonrandom event (Fig. 8B), owing to the manner in which the metric is normalized (by number of species present in each time interval). During the nonrandom extinction, all measures show a marked decrease throughout the event, with the exception of average pairwise dissimilarity. (With our version of the metric this exception is to be expected. In the nonrandom extinction case presented here there are only two characters; thus only two different character-state combinations possible. For a decrease to occur at least some data points would have to lie precisely on the character 1 = character 2 line, which does not occur in this case.)

#### Sensitivity Analysis of Morphospace Occupation Pattern of Permo-Triassic Crinoids

To evaluate the measures of disparity in response to expansion and contraction of the crinoid species morphospace, four time intervals spanning the Permian and Early Triassic were selected. These intervals were chosen because disparity among the crinoid genera reached a high level during the Permian, drastically fell toward the end of the Permian (reflecting the loss of taxa due to the end-Permian mass extinction), and then rose again during the Early Triassic (Foote 1996, 1999). Of course one must keep in mind the matter of resolution. While some crinoid taxa went extinct before the end Permian extinction, other taxa survived the event. With the proper temporal resolution it would be possible to distinguish changes in disparity due to the extinction from those occurring in the surviving taxa, but we do not attempt this here. Rather we show only the net change in all taxa from the Permian through the Triassic. Figure 9 shows the temporal occupation of morphospace during the four intervals, based on the first two principal coordinates. Figure 10 shows the values of disparity calculated for each time period using each measure of disparity. In each time interval, all species present were used to calculate disparity. The sum of variances, range, mean distance, and PCO volume all show very similar patterns. This is to be expected because each measures some aspect of simple morphologic/Euclidean distance. Average pairwise distance, participation ratio, and unique pairwise character combinations each show a very different temporal pattern of disparity, because each measures a different aspect of morphospace occupation. Figure 11 shows the values of disparity calculated for each time period using each measure of disparity, differing from Figure 10 in that each value is the mean of a bootstrap performed 50 times with a sample size of ten (the number of species present in the Late Permian; hence a bootstrap was not performed for the Late Permian crinoid data). All seven measures show a statistically significant change in morphospace occupation pattern, although not all measures show the same significant changes. Interestingly, Figure 11A–D show a significant change in disparity between the middle Permian and Late Permian and between the Late Permian and Early Triassic, whereas participation ratio and average pairwise dissimilarity show a significant change only between the Late Permian and Early Triassic. Finally, unique pairwise character combinations shows a significant change in morphospace occupation between the middle Permian and Late Permian.

## DISCUSSION

#### The Necessity of Random Sampling

Figure 12 illustrates two sampling regimes. In Figure 12A, sampling is confined to a particular region of morphospace, and as a result the Euclidean and non-Euclidean measures of disparity fail to capture the amount of disparity present. In Figure 12B, sampling is random, and all measures, with the exception of pairwise character combinations, capture a significant amount of the disparity present. (Because pairwise character combinations increases slowly with an increase in sample size, it is apparent that unless all species are sampled, subsampling will not yield an accurate estimate of the number of character combinations present.) Therefore, in order to perform a thorough analysis of disparity within a particular taxon (at any taxonomic level) it is necessary to obtain a random sample of that taxon. Of course, obtaining a random sample can be difficult, on account of bias in the fossil record. For example, the Cambrian fauna would seem to be dominated by trilobites, brachiopods, and archaeocyathids, but Lagerstätten assemblages, such as the Burgess Shale or Chengjiang Fauna, point to a much richer, morphologically diverse fauna. Another possible bias can result during specimen collection. For example, during the late 1800s and early 1900s the focus was on large dinosaurs at the expense of smaller specimens, thus presenting a skewed picture of dinosaur size distribution. These biases can sometimes be overcome, however. Fortunately, Lagerstätten assemblages do exist, the fossil record continues to improve, and the field of taphonomy has provided many insights into preservation bias in the fossil record.

#### Advantages and Disadvantages Associated with Selected Disparity Measures

One of the principal findings of this study is that no one method provides an adequate means to detect all aspects of disparity. Thus, studies that use only one or two methods of disparity analysis may overlook certain patterns of morphological diversification. Below, each measure of disparity is evaluated in light of these results.

1. *Sum of variances.* The sum of variances is easily calculated and provides an estimate of the amount of difference between character states among specimens in morphospace. However, this measure fails to provide an estimate of the total amount of morphospace occupied, or to give an estimate of the absolute range of occupied morphospace. As can be seen in Figure 6C, whereas other distance measures decrease as morphospace undergoes a change of shape, sum of variances increases, not an initially intuitive result. Furthermore, when a rarefaction analysis is performed (i.e., sensitivity to sample size is being investigated), sum of variance is associated with high values of standard deviation.

2. *Range.* The range, like the sum of variances, is easily calculated. Although range indicates the magnitude of dissimilarity that exists among species, it gives no indication of the shape of occupied morphospace. Like sum of variances, range is associated with high values of variance in rarefaction analyses.

3. *Mean distance.* The mean distance provides an estimate of the amount of difference between character states among specimens in morphospace. But the measure can obscure certain aspects of morphospace occupation. For example, if three points occupy the morphospace, where two are widely separated and the third is close to either one of the widely separated points, the mean distance will fail to give an accurate estimated of the total range occupied. The mean distance can also obscure the shape of occupied morphospace. On the other hand, the measure is robust to sample size.

4. *PCO volume.* Various measurements of morphospace volume can yield good approximations of the total amount of occupied morphospace. Unfortunately measures such as hypervolume can produce values that are extremely small and variable. Because the hypervolume is calculated by taking the product of univariate variances, any axis or axes with negligible variance will produce a value of hypervolume close to zero. Thus, hypervolume can be very sensitive to variation in a single character. Using the PCO volume instead avoids this issue because only the axes with significant variances are typically chosen to represent the disparity among points in morphospace. However, PCO volume has a medium to high variance (Tables 1 and 2) associated with each measurement generated via rarefaction analysis. PCO volume can also yield somewhat misleading results. For example, in Figure 8A disparity increases while the number of species decreases. (Because the measure is normalized by the number of species squared, during a random extinction the value of the number of species squared decreases much more rapidly than the product of the eigenvalues.)

5. *Average pairwise dissimilarity.* Average pairwise dissimilarity provides a very robust measure of disparity. It is relatively insensitive to sample size and is associated with low levels of variance (Tables 2, 4) during rarefaction analysis. However, average pairwise dissimilarity is relatively insensitive to changes in morphospace occupation patterns (see Figs. 6, 8, and, 10) and does not give an adequate estimation of the amount of morphospace occupied.

6. *Participation ratio.* The participation ratio gives a indication of the amount of occupation that taxa exhibit in morphospace and can be obtained at a reasonable sample size, yet of the three non-distance measures it has the highest amount of variance associated with each measured value. In this respect the participation ratio behaves in the same manner as the distance measures—high variance, particularly at low sample sizes. It should be noted that though participation ratio provides an estimation of morphospace occupation, it provides no information about occupation pattern (i.e., it is possible that many different occupation patterns when analyzed could all yield the same value for participation ratio).

7. *Number of unique pairwise character combinations.* Number of unique pairwise character combinations yields a very intuitive idea of the amount of character space occupied among taxa. One drawback of this measure is that observed pairwise character-state combinations depends strongly on sample size, necessitating the use of rarefaction to eliminate sample-size bias (Foote 1992, 1995). Like average pairwise dissimilarity, number of unique pairwise character combinations is associated with low values of variance in species rarefaction analyses.

All measures show dependency upon the number of characters used, with the simple morphologic distance measures remaining constant once a substantial number of characters is reached (Table 3). Finally, all measures appear to be fairly insensitive to missing data (Tables 2, 4).

#### Temporal Patterns, Character Choice, Mass Extinctions, and Measures of Disparity

Measurements of disparity have been used extensively to determine particular patterns of morphological evolution (Foote 1993, 1994, 1995, 1996, 1999; Wills et al. 1994; Wagner 1995, 1997; Lupia 1999; Smith and Bunje 1999; Eble 2000). Ideally, we would want measures of disparity that can detect changes in morphospace occupation while generating values that are associated with low levels of variance (upon bootstrap analysis). But two main points should be kept in mind when performing a disparity analysis: (1) Different measures capture different aspects of disparity, and (2) different measures have different levels of robustness. Thus, although some measures have the ability to detect certain changes in pattern, not all have the ability to provide the resolution (owing to variance upon bootstrap analysis) necessary to detect those particular patterns. Perhaps the best way to view this dichotomy is as follows. Each measure of disparity is designed for different purposes, in which case an insensitivity or an oversensitivity of one kind or another isn't really a “failure” of the measure, just a particular property.

Relevant to the issue of pattern detection is the choice of, and hence number of, characters used to characterize morphology. As is evident in Figure 2 and Table 3, the number of characters used affects the value of disparity measured. Using a set number of characters selected from a taxon during an initial time period and subsequently ignoring characters that appear in that taxon at a later time period may give an incomplete view of the amount of disparity present at the later time period. For example, if one were to use discrete characters such as number of vertebrae and skull elements in early vertebrates (before the evolution of tetrapods), and then ignore number of metatarsals, metacarpals, and digits on limbs that evolved during the early tetrapod radiation, one would fail to detect an important morphological event in the history of the Vertebrata, and hence the amount of disparity present during and after the initial tetrapod radiation. Alternatively, simply splitting (adding) characters, say adding as a character number of phalanges, during later time periods may inflate the amount of disparity present. Using a measure such as the number of unique pairwise character combinations may be quite useful in determining the amount of character space occupied by a particular taxon (Thomas and Reif 1993; Foote 1999).

A very interesting consequence of this study involves the detection of changes in morphological pattern over a mass extinction event. As shown in Figure 8, measurement behaviors are dependent upon the type of ongoing extinction event. If, for example, one were to detect a decreasing amount of disparity using range, mean distance, PCO volume, participation ratio, or number of unique pairwise characters during a mass extinction (as in Fig. 8B), one would conclude that morphologically diverse taxa (i.e., taxa with extreme morphologies) were preferentially disfavored during the event. Figures 9–11 illustrate this in real data by showing a contraction of morphospace, with concurrently decreasing values of disparity (particularly among the distance measures) during the end-Permian mass extinction event. The two caveats here are that the taxa were selected randomly for use in the study (random sampling allows for relatively small sample sizes which may be unavoidable in post extinction samples) and that the sample size was large enough to yield a significant result. This result may seem obvious, yet trying to detect whether extinctions are simply random with respect to the taxa involved or due to the lack or possession of particular morphologies is quite a difficult task.

## CONCLUSION

It seems clear that no one method of disparity measurement is sufficient for all purposes. Ideally one would have unlimited resources with which to obtain and measure numerous specimens of each taxon. Unfortunately, owing to gaps in the fossil record, prohibitive costs, and large amounts of time necessary for large studies, this is seldom the case. Therefore, methods that can yield a thorough characterization of disparity at low sample sizes and low variance are highly desired. For the above reasons a combination of distance and non-distance disparity measures are the techniques most likely to be of use. A measure of disparity that is insensitive to sample size, such as the average pairwise dissimilarity, is ideal for a first approximation of disparity among taxa. Unique pairwise character combinations or participation ratio could then be used to gain an idea of the amount of character space occupied. Finally, a simple morphologic distance measure, such as mean pairwise distance or PCO volume, can give an idea of the change of space-filling pattern exhibited by measured taxa. Using a combination of techniques should allow a clearer picture of disparity to emerge.

## Appendix

### PARTICIPATION RATIO

The participation ratio is a tool used by the solid-state physics community to describe electronic wave functions (Economou 1983). The concept is not limited to quantum physics, however. Rather, participation ratio is a tool used to describe the extent of localization of a probability distribution. Consider *N* cells and let *p*(*n*) be the probability that a realization falls in cell *n.* The idea behind participation ratio is to define a number that estimates the number of cells that have a “substantial” probability. That number should be between 1 and *N,* with 1 describing a case where one cell only contributes, and *N* the extreme opposite of all cells contributing equally. Mathematically, the participation ratio is computed as follows:

To grasp how the participation ratio works, consider first a case in a single cell contributes. The probability of the occupied cell is 1 (*p* = 1) and that of all other cells is zero (*p* = 0). Substituting in the definition, we find that the participation ratio is *P* = 1 as desired. Consider now the other extreme case of all cells equally occupied. Because there are *N* cells, each cell has probability *p* = 1/*N.* Substituting in the definition, we find *P* = *N.*

It should be noted that *P* measures the extent of localization but provides no information about location. It should also be noted that the value of *P* depends on the number of cells *N* used in binning the space of possible outcomes.

## Acknowledgments

We thank M. Foote for kindly letting us use his crinoid data. We would also like to thank D. H. Erwin, W. G. Wilson, J. Mercer, G. L. Byars, P. Novack-Gottshall, M. Foote, and P. Wagner for their insightful comments and discussion.

- Accepted 24 April 2001.