- The Paleontological Society
Biological veracity of the sharp diversity increase observed in many analyses of the post-Paleozoic marine fossil record has been debated vigorously in recent years. To assess this question for sample-level (“alpha”) diversity, we used bulk samples of shelly invertebrates, representing three major fossil groups (brachiopods, bivalves, and gastropods), to compare the Jurassic and late Cenozoic sample-level diversity of marine benthos. After restricting the data set to single-bed, whole-fauna, bulk samples (n ≥ 30 specimens) from comparable open marine siliciclastic facies, we were able to retain 427 samples (255 Jurassic and 172 late Cenozoic), with most of those samples originating from our own empirical work.
Regardless of the diversity metric applied, the initial results suggest that standardized sample-level species (or genus) diversity, driven by evenness and/or richness of the most common taxa, increased between the Jurassic and late Cenozoic by at least a factor of 1.6. When the data are partitioned into the three dominant higher taxa, it becomes clear that (1) the bivalves, which dominated the samples for both time intervals, increased in sample-level diversity between the Jurassic and the late Cenozoic by a much smaller factor than the total fauna; (2) the removal of brachiopods, which were a noticeable component of the Jurassic samples, did not significantly affect standardized sample-level diversity estimates; and (3) the gastropods, which were rare in the Jurassic but common in many late Cenozoic samples, contributed notably to the increase in sample-level diversity observed between the two time intervals. Parallel to these changes, the samples revealed secular trends in ecological structure, including Jurassic to late Cenozoic increases in proportion of (1) infauna, (2) mobile forms, and (3) non-suspension-feeding organisms. These trends mostly persist when data are restricted to bivalves.
Supplementary analyses indicate that these patterns cannot be attributed to sampling heterogeneities in paleolatitudinal range, lithology, or paleoenvironment of deposition. Likewise, when data are restricted to samples dominated by species with originally aragonitic shells, the observed temporal changes persist at a comparable magnitude, suggesting that the pervasive loss of aragonite in the older fossil record is unlikely to have been the primary cause of the observed patterns. The comparable ratio of identified to unidentified species and genera, observed when comparing the Jurassic and late Cenozoic samples, indicates that the relatively poorer (mold/cast) preservation of Jurassic aragonite species also is unlikely to have been responsible for the observed patterns. However, the diagenesis-related taphonomic and methodological artifacts cannot be ruled out as an at least partial contributor to the observed post-Paleozoic changes in diversity, taxonomic composition, and ecology (the outcomes of the three tests of the diagenetic bias available to us are incongruent).
The study demonstrates that the post-Paleozoic trends in the sample-level diversity, ecology, and taxonomic structure of common taxa can be replicated across multiple studies. However, the diversity increase estimated here is much less prominent than suggested by many previous analyses. The results also narrow the list of causative explanations down to two testable hypotheses. The first is diagenetic bias—a spurious trend driven by either (a) increasing taphonomic loss of small specimens in the older fossil record or (b) a shift in sampling procedures between predominantly lithified rocks of the Mesozoic and predominately unlithified, and therefore sievable, sediments of the late Cenozoic. The second hypothesis is genuine biological changes—macroevolutionary trends in the structure of marine benthic associations through time, consistent with predictions of several related models such as evolutionary escalation, increased ecospace utilization, and the Mesozoic marine revolution. Future studies should focus on testing these two rival models, a key remaining challenge for identifying the primary causative mechanism for the long-term changes in sample-level diversity, ecology, and taxonomic structure observed in the Phanerozoic marine fossil record.
The Phanerozoic history of marine biodiversity has been among the most vigorously researched and contentious themes of paleobiology (e.g., Valentine 1969; Raup 1972, 1976; Sepkoski et al. 1981; Benton 1995; Alroy et al. 2001; Peters and Foote 2001; Jablonski et al. 2003; Bambach et al. 2004). In particular, the biological veracity of the post-Paleozoic (Triassic to Recent) increase in biodiversity that is clearly observed in many synoptic curves (e.g., Sepkoski 1981; Bambach 1999), but can be removed partly or entirely by using various correcting factors (e.g., Alroy et al. 2001; see also Bush et al. 2004), has remained among the key controversies.
In recent years, building on the pioneering work of Bambach (1977), the increasing focus has been placed on reconstructing sample-level diversity by using time series of controlled bulk samples with a known species abundance structure (Powell and Kowalewski 2002; Bush and Bambach 2004; Peters 2004; Finnegan and Droser 2005; Kosnik 2005). Such estimates are not hampered by sampling standardization problems that may limit the utility of range and occurrence data (but see Alroy et al. 2001) and can potentially provide a controlled way for evaluating the biological veracity of long-term trends of at least one component of biodiversity: the sample-level diversity (often referred to as “alpha diversity”), which is driven by evenness and/or richness of the most common taxa.
In this study, sample-level diversity patterns, based on fossil assemblages dominated by three major fossil groups of benthic invertebrates (brachiopods, bivalves, and gastropods), are compared between two critical time intervals: (1) the Jurassic, which corresponds to the early phase of the contentious post-Paleozoic diversity rise observed in many synoptic curves; and (2) the late Cenozoic, which represents the most elevated segment of those curves. Because many synoptic curves predict that these two time intervals should differ greatly in their sample-level (e.g., Bambach 1977) and global (e.g., Sepkoski et al. 1981) diversity, such a comparison offers an ideal target for testing veracity of the secular diversity trend. Moreover, the approach used here should allow us to augment previous research in several critical ways:
The previous standardized sample-diversity analyses were either focused on comparisons between the Paleozoic and Neogene (Powell and Kowalewski 2002; Bush and Bambach 2004) or confined to narrower time frames (Peters 2004; Finnegan and Droser 2005; Kosnik 2005). To our knowledge this is the first large-scale study of marine benthic macro-invertebrates that compares Mesozoic and late Cenozoic sample-level diversity patterns in a controlled and comprehensive way.
This study is based on a notably larger number of bulk samples (427 samples retained in final analyses presented below) than the previous standardized comparisons of sample-level diversity across the large segments of the Phanerozoic (Powell and Kowalewski 2002 [88 samples]; Bush and Bambach 2004 [126 samples]). This increase in sample size not only enhances the statistical power of the data, but also allows us to maintain reasonable sample sizes after grouping data by paleolatitude, lithology, and other diversity-relevant grouping variables.
Unlike the three previous large-scale comparisons of Bambach (1977), Powell and Kowalewski (2002), and Bush and Bambach (2004), this study is based nearly exclusively on samples collected by the authors themselves (over 90% of samples used in this study come from our own research collections; Table 1). Thus, the resulting data set is well understood in terms of field sampling protocols and specimen-processing methods, and is also highly consistent internally in terms of taxonomic nomenclature.
Nearly all samples compiled in this analysis include information about paleolatitude, paleolongitude, lithology, degree of lithification, inferred depositional environment, and several other parameters that can be used to evaluate various spurious effects that may affect diversity patterns recorded in our data. Whereas previous studies considered some of those parameters, none could control for all of them, because of either lack of data or an inadequate number of samples.
Finally, by including numerous ecologically and taphonomically relevant parameters coded for all specimens at species level, we can compute many sample-level parameters regarding relative abundance of specimens across mineralogical groups, dwelling position, feeding mode, mobility, and higher taxonomic assignment (family, order, class) of genera and species. These supplementary variables should allow us to consider explicitly many presumed causative mechanisms for observed diversity trends.
In summary, the data attributes should make it possible to evaluate the sample-level diversity patterns while controlling for a multitude of potentially confounding factors.
In this paper, we refer to the sample-level diversity as “sample diversity” rather than “alpha diversity” or “within-habitat diversity” (see also Kowalewski et al. 1998; Scarponi and Kowalewski in press). The latter terms, although frequently used in the paleontological literature (including also previous publications of the present authors), should be avoided because of their explicit ecological connotation. Diversity estimates based on time-averaged hard-part-restricted fossil samples are not analogous to point diversity estimates used by ecologists to assess alpha diversity of local communities.
Unless otherwise indicated, the terms “sample diversity” and “diversity” denote here standardized sample-level diversity that pertains to species richness of most common species (those that are likely to have been sampled at small sample size). Given this definition, the term “diversity” as used here is nearly synonymous with evenness/equitability (relative abundance structure) of common species (see also Olszewski 2004; Scarponi and Kowalewski in press), and should not be equated with a total taxonomic richness at a given site, in a given region, or globally. However, previous studies that offered quasi-standardized (e.g., Bambach 1977) or standardized (e.g., Powell and Kowalewski 2002; Bush and Bambach 2004; Peters 2004; Finnegan and Droser 2005; Kosnik 2005) sample-level data also dealt with diversity estimates that potentially combine information about both the richness of the common taxa and the sample-level evenness structure. By definition, such sample-standardized studies cannot yield quantitative estimates of total taxonomic richness (but see Bush and Bambach  for a qualitative assessment). They also are fundamentally different from synoptic literature compilations of range or occurrence data (Sepkoski 1981; Bambach 1999; Alroy et al. 2001) that may capture information about less common taxa (see also later in the text).
The data consist of Jurassic and late Cenozoic (Miocene–Pleistocene) bulk samples collected primarily by the authors. Most of the samples represent published data and have been integrated into the Paleobiology Database (http://paleodb.org).
The initial data set consisted of over 1000 bulk samples. However, to maximize the internal coherence of the final data set, the data were restricted as follows:
Only localized samples tabulated from a single bed or extracted from a single sedimentary layer were included. Samples pooled across multiple horizons of multiple sampling sites have been excluded.
Only open marine siliciclastic facies were included. We excluded samples from carbonate facies because sample-level diversity can vary notably between carbonates and siliciclastics (see Powell and Kowalewski 2002, for an Ordovician example) and because carbonate facies are nearly absent in our late Cenozoic data set. In addition, we excluded samples believed to have come from marginal marine environments which tend to have more variable, often lower (e.g., Bambach 1977; Kowalewski et al. 2002), sample diversity. Finally, we excluded samples from oxygen-limited settings or otherwise unusual marine biofacies (e.g., the Middle Jurassic Opalinus clay of Switzerland).
Only samples with at least 30 specimens and three or more species have been retained.
After these restrictions were applied to our data, a total of 427 samples have been retained for the final analysis.
Although spatial resolution and most of the sample processing aspects are uniform in our data set, there is one important methodological difference between the two targeted time intervals. Because Jurassic data came primarily from lithified rocks and nearly all late Cenozoic data were derived from unconsolidated sediments, the sample-acquisition methodology differs between the two time intervals. In the Jurassic, samples were generated by exhaustive counts of all identifiable specimens visible on the bedding planes or by exhaustive extractions of specimens from in situ derived slabs disintegrated mechanically in the laboratory (e.g., Fürsich 1977; Aberhan 1993). In the late Cenozoic, most samples were obtained by wet sieving of bulk sediment with fine mesh (for details see Kowalewski et al. 2002; Scarponi and Kowalewski 2004), with >95% of the late Cenozoic samples included here sieved with screens with mesh size of 1 mm or finer. The specimen-acquisition methods are the only major difference in the sampling protocols applied for the two compared time intervals. The possible effects of these differences are considered explicitly (and tested in multiple ways) in the discussion section below.
All samples have been coded using a uniform set of values. These included the following:
Paleolatitude, calculated with the rotation file of Chris Scotese (personal communication 2001) using present-day coordinates and numeric ages.
Lithology, with samples grouped into three broad categories of grain size: (a) “sandstone” (e.g., sand and sandstone of various fractions, silty and muddy sandstone); (b) “siltstone” (silt-dominated rocks and sediments); and (c) “very fine grained” (e.g., rocks and sediments dominated by finest sediment fractions, including clay/mud, claystone/mudstone, shale, and silty mud).
Paleoenvironment, with samples grouped into two broad categories: (a) “offshore” (all environments below the storm wave base); and (b) “onshore” (proximal open marine environments located above the storm wave base).
Degree of diagenesis, with samples grouped into three categories: (a) “unlithified” (unconsolidated, sievable, sediments); (b) “poorly lithified” (units affected by incipient cementation only, with rocks easily disintegrable with H2O2); and (c) “lithified” (fully lithified, unsievable rocks).
Stratigraphic age of a sample, including (a) chronostratigraphic age reported using standard stratigraphic stages; and (b) absolute age reported in millions of years (based on the timescale of Gradstein et al. ).
Additional variables such as country, purpose of collection, paleolongitude. These were not included in the analyses below and therefore are not discussed in detail here.
It should be noted here that, even with over 400 samples available for analysis, the above grouping variables cannot be analyzed at finer resolution because that would result in splitting samples into groups with unsatisfactorily low numbers of samples (e.g., our data allow us to subdivide paleoenvironments into over ten categories, but then numbers of samples per paleoenvironment would be too small for any meaningful statistical analyses).
For each sample, each row of data represents a species. For each species, the higher taxonomic ranks include genus, family, order, and class. Unidentified species were marked as “sp.” (or “sp1,” “sp2,” etc. in case of multiple congeneric species). Unidentified genera were marked as “indet.” (or “indet1,” “indet2” in case of multiple genera from the same family).
For each species, additional variables recorded included count (number of individuals in a sample) as well as several categorical variables describing ecology and mineralogy of a given species: (1) mode of life: infauna (including both shallow and deep infauna), semi-infauna (including organisms with a partly buried mode of life), and epifauna (organisms residing on the surface); (2) mobility: sessile (forms entirely stationary; e.g., fixed epibyssate forms), limited mobile (forms capable of slow movement, often stationary for prolonged periods of time), mobile (forms actively mobile for prolonged periods of time); (3) feeding (suspension feeders, deposit feeders/ grazers/herbivores, predators, and other); and (4) original shell mineralogy (calcitic, bimineral, aragonitic). In addition, for a subset of data, maximum shell size was measured for all individuals in the sample (these limited shell size data were used in some secondary analyses only).
All analyses were restricted to three higher taxa (classes): bivalves, gastropods, and articulate brachiopods. Other taxa were too infrequent (crustaceans, echinoids) or difficult to quantify in terms of number of individuals (serpulids, crinoids, bryozoans). They were therefore all excluded from the raw data, prior to any statistical analysis. However, for the overwhelming majority of samples (>95%), these additional higher taxa represented a minor fraction of specimens (<5%), so their exclusion should not generate any substantial difference in the patterns reported below.
In all diversity analyses, we used sample standardization by rarefaction. The expected number of taxa (species and genera, respectively) in a given subsample of n individuals was calculated by using the rarefaction algorithm as given in Krebs (1999: p. 414). This algorithm estimates the average number of taxa one would get when infinitely repeating the random draw of n individuals from the total pool of individuals. The calculation of large-sample variance also followed Krebs (1999: p. 415). We carried out rarefaction down to 30 and 100 individuals per sample. In addition, an array of standard evenness metrics (see Table 2 later in the text) were computed (for references and up-to-date discussion of those and other metrics see Washington 1984; Smith and Wilson 1996; Hubalek 2000; Olszewski 2004) and compared in terms of their performance.
Ecological and mineralogical sample-level parameters (e.g., percent epifauna or proportion of individuals with original aragonitic shell mineralogy in a given sample) were computed as per-sample proportions for all categorical variables by averaging categorical scores weighted by counts of individuals. Note that proportions were computed from raw data, and not sample-standardized rarefied data. This is justified given that, as would be expected theoretically, binary proportions do not show any significant or notable correlations with sample size (r ≪ 0.1, p ≫ 0.05 in all cases). It is worth emphasizing here that proportions are estimators independent of sample size (although their precision obviously improves as the sample size increases), and, unlike some diversity metrics, do not need to be sample-standardized via rarefaction or other related procedures.
Statistical decisions primarily were based on non-parametric rank tests. These tests require fewer assumptions than parametric tests and often offer nearly as much power as their parametric counterparts or bootstrapping. Parametric tests were also reported for comparison. Bootstrap randomizations were carried out using balanced bootstrap design (see also Hall 1992). Bootstrap estimates of significance and bootstrap confidence intervals were derived using a percentile approach (so called “naïve bootstrap” sensu Efron 1981). Each bootstrap analysis was based on 1000 independent iterations (estimators stabilized well below 1000 iterations in all cases reported here so using a larger number of iterations was deemed superfluous). Bootstrap analyses were written in SAS/IML and statistical tests were performed using SAS/STAT procedures. Rarefaction calculations were performed using an algorithm developed for dBase SE language. Throughout all analyses, we have assumed the significance level of alpha = 0.05.
Two caveats related to inherent properties of numerical data derived from bulk samples need to be emphasized. First, not only do paleontological samples preserve just a subset of all organisms that lived, but also many groups of organisms that are preserved tend to be excluded from numerical tallies because they are difficult to count in a meaningful objective manner (e.g., crinoids, echinoids, bryozoans, corals). Thus, the patterns and conclusions presented below should be considered as applicable to the three common groups of shelly fauna only. However, given the scarcity of other higher taxa in the samples, the patterns derived for those three groups provide a nearly exhaustive representation of the quantifiable fossil record that can be extracted from our samples. Admittedly, this does not necessarily mean that such data have to approximate faithfully the overall patterns for the entire marine benthos, or even for its skeletonized component. Arguably this caveat applies to all quantitative studies based on data derived from the fossil record.
Second, the patterns described here pertain to species that are abundant enough to be detected at small sample sizes. In other words, the results are irrelevant in terms of rare species hiding in the tail of the species abundance distribution, as this tail is undetectable in the case of our data. Thus, as is the case in all studies on standardized sample-level diversity (e.g., Powell and Kowalewski 2002; Bush and Bambach 2004; Peters 2004; Kosnik 2005), we deal only with the tip of the diversity iceberg. The data and results cannot and should not be extrapolated to total diversity patterns (e.g., total species richness). For example, it would be absurd to argue that since the removal of brachiopods did not lower the average sample-level diversity (see below), the loss of brachiopods could not have represented a loss for global diversity. Even more importantly, the changes in proportion of higher taxa, ecological components, and diversity of bivalves and gastropods should not be equated with total point (site-level) estimates that would be achieved by exhaustive sampling. The second caveat is, however, also the strength of the sample-level approach. First, from the ecological perspective, the diversity structure of abundant species is arguably much more relevant than the total number of rare forms that cannot be captured at small sample size (see also the ecological literature on diversity; e.g., Tilman et al. 2005; and references therein). For a paleoecologist, the tip of the diversity iceberg is its most interesting part. Second, the focus on the most abundant species complements the results of occurrence-based and range-based analyses, which offer fundamentally different estimates of diversity. Congruencies and discordances between the two types of diversity proxies may themselves reassure us to the veracity of the observed patterns (e.g., Sepkoski et al. 1981) or alert us to possible pitfalls and caveats undermining our current knowledge of the history of biodiversity (e.g., Kosnik 2005).
Comparison of Diversity Metrics
The database (Table 1) includes 427 samples with at least 30 specimens (255 Jurassic and 172 late Cenozoic samples). The sample size is variable (standard deviations of 112.8 individuals for the Jurassic and 136.8 individuals for the late Cenozoic, respectively) and the mean sample size differs notably between the two time intervals: Jurassic = 146.3 individuals, and late Cenozoic = 182.4 individuals. Consequently, sample diversity estimates need to be standardized by applying subsampling and/or sample-independent diversity metrics (see above for methodological details).
All diversity and evenness metrics, whether computed at genus or species level, show high correlations with one another: the average absolute Pearson correlation coefficient rMEAN (average absolute value of r for a given variable) of any given metric with all other metrics ranges between 0.79 and 0.91 (Table 2). As a general approximation, any of these metrics could thus be applied in subsequent analyses. We decided to use Div30SP (sample-standardized species diversity at n = 30) because Div30SP (1) correlates well with all other metrics (rMEAN = 0.87); (2) is not correlated significantly with sample size (r = 0.06, p = 0.20); and (3) allows us to use all samples (subsampling at n = 100 results in a loss of 35% of samples). The analyses were also performed at n = 100. All results at n = 100 were highly consistent with the analyses at n = 30. Also, despite the smaller number of samples included at n = 100, all p values significant at n = 30 remained significant at n = 100. We chose species rather than genera because (1) at the level of individual samples the great majority of genera (>95%) are monospecific, so all outcomes are nearly identical for genus and for species; and (2) the sample-level ratios of species to genera is comparable for the Jurassic (1.05) and late Cenozoic (1.07) data sets (computed for raw, non-standardized data). Although the results below are restricted primarily to Div30SP, we occasionally report results for genera and for n = 100 to highlight the high consistency across metrics regardless of how our data set is partitioned or restricted.
Other metrics reported in Table 2 (H, J, and Hurlbert PIE) are evenness/diversity metrics expressed as nondimensional values and are less intuitive than Div30SP expressed as the number of species attained at 30 specimens. Moreover, some of those other metrics (H and J) show a significant (albeit low) positive correlation with sample size and should therefore be avoided.
Note here that diversity metrics presented in Table 2 provide proxies for both richness and equitability of the most common species that can be captured at low sample sizes. The diversity of rare species and genera cannot be evaluated from our data (see also “Caveats” above). In particular, Div30SP is a simultaneous richness and evenness metric: Div30SP can be thought of as a measure of the initial slope of the rarefaction curve at a preset n value (30 in this case). In addition, it is noteworthy that Div30SP (and all other metrics reported in Table 2) can be also approximated well by RAB1, which represents the relative abundance of the most common species in a sample and is negatively correlated with all other diversity metrics (average r = −0.79) (RAB1 is equivalent to the Berger-Parker index; see May 1976; Magurran 1988, 2004). For further discussion of the relation between evenness and diversity metrics please refer to Olszewski (2004, and references therein).
Div30SP offers a measure of diversity that is congruent with various metrics used in previous studies of sample-level diversity. Whether called “richness” (Bambach 1977) or “evenness” (Powell and Kowalewski 2002), all estimators used in previous sample-level studies also dealt with common taxa and reflected simultaneously evenness and richness of common taxa.
The comparison of the two data sets reveals statistically significant differences in standardized sample diversity (Table 3, Figs. 1, 2): late Cenozoic samples are on average much more diverse than Jurassic ones, with mean Div30SP of 11.7 species for the late Cenozoic and 7.1 species for the Jurassic, respectively. This difference represents an increase in mean sample diversity by a factor of 1.65 (the increase factor is comparable for species diversity at n = 100, but slightly higher for genus level analyses; see Table 3). Thus, when considered at face value, the sample-level diversity increased notably between the Jurassic and the late Cenozoic. The magnitude of this increase is lower than that obtained in previous studies. For non-standardized data from open marine habitats, the diversity increased from the Jurassic to late Cenozoic by a factor of three (Bambach 1977: Fig. 5), but this estimate is based on a small number of samples. The sample-standardized comparisons of Powell and Kowalewski (2002) and Bush and Bambach (2004) suggested that sample-level diversity increased between the early-mid Paleozoic and late Cenozoic by a factor of 2.5 or more. Thus, the increase by a factor of 1.65 observed here suggests that average sample-level diversity of Jurassic samples is intermediate between the early-mid Paleozoic and the late Cenozoic estimates. Mean species diversity at n = 100 for the Paleozoic was estimated by Bush and Bambach (2004: Table 2) at 9.99, an estimate substantially lower than the value of 11.56 reported here for the Jurassic (Table 3).
As clearly illustrated by Figure 2, the observed difference in mean sample-level diversity reflects the fact that a notable proportion of late Cenozoic samples (19.2%) yielded Div30SP estimates exceeding 15 species, whereas only four Jurassic samples (2.6%) attain such diversity levels. Conversely, over one-third of Jurassic samples (38.4%) yielded Div30SP estimates below six species, whereas only six late Cenozoic samples (3.5%) record such low diversity values.
Diversity Partitioning across Higher Taxa
The Jurassic and late Cenozoic samples reveal striking similarities as well as notable differences in their higher-level taxonomic composition (Fig. 3A–F). The Jurassic data set (Fig. 3A–C) and the late Cenozoic data set (Fig. 3D– F) are similar in that both are dominated by samples consisting mostly of bivalves. In terms of mean percentages, the two time intervals are nearly identical. On average, bivalves represent 75.2% of individuals in Jurassic samples and 71% in the late Cenozoic. Moreover, in both time intervals, median percentages are very high: a median of the Jurassic samples is 95.1% (i.e., in half of the samples, 95% or more individuals are bivalves) compared with 85.7% for the late Cenozoic samples.
However, the two time intervals differ dramatically in the proportion of other higher taxa. Whereas brachiopods are completely absent in the late Cenozoic samples (all samples fall into the bin of 0% brachiopod specimens; see Fig. 3F), they are an important component in a respectable fraction of the Jurassic samples (Fig. 3C), with an average sample having 22.2% individuals in brachiopods and about one fifth of the samples being dominated by brachiopods (i.e., having over 50% brachiopod individuals per sample). Conversely, gastropods are very scarce in the Jurassic samples (Fig. 3B) but quite common in many of the late Cenozoic samples (Fig. 3E). Gastropods represent only 2.6% of individuals in an average Jurassic sample, compared with 28.4% in an average late Cenozoic sample. Moreover, only 15 Jurassic samples (mere 6%) include any notable (>10% individuals) fraction of gastropods, compared with over half (56%) of the late Cenozoic samples. Finally, close to one-quarter of the late Cenozoic samples (24%) are dominated (i.e., >50% individuals per sample) by gastropods (Fig. 3E), compared with only one sample (0.4%) in the Jurassic (Fig. 3B).
Given the observed temporal shift in relative importance of the higher taxa, three mechanisms can be theoretically postulated to account for the observed increase in sample-level evenness (and sample-standardized diversity thereof) through time: (1) an increase in evenness due to loss of brachiopods (this would be possible if brachiopods were typically high in dominance so their presence suppressed the evenness of bivalve fauna); (2) an increase in evenness due to addition of gastropods (this would be possible if gastropods were sufficiently equitable to increase the whole-fauna sample evenness); and (3) an increase in evenness of bivalve fauna, which is dominant in both time intervals. Obviously, the observed increase also may represent some combination of these three mechanisms.
To assess this issue we reanalyzed diversity patterns for the bivalve component of the fauna only. The estimates of diversity for Jurassic bivalves alone are very similar to those obtained for the entire Jurassic biota: when comparing the estimates restricted to the bivalve component of the fauna with those obtained in whole-fauna analyses (Table 4), the mean sample diversity of bivalves tracks closely the mean sample diversity of all fauna. For both genera and species, and for both n = 30 and n = 100, this difference is always very minor (below 3%) and the sign of the difference varies across the four diversity metrics. This indicates that the removal of sometimes common brachiopods and usually sporadic gastropods from bivalve-rich samples—as well as the removal of samples where brachiopods and/or gastropods dominate (note that when n < 30 for bivalves the sample had to be excluded from the analysis)—does not affect notably the mean sample diversity. In contrast, the late Cenozoic diversity estimates for bivalves (Table 4) are notably lower than the late Cenozoic diversity of all fauna. Again this difference is consistently manifested in all four metrics reported in Table 4. Despite this decrease, the mean sample diversity of the late Cenozoic bivalves is still significantly higher than the mean diversity of the Jurassic bivalves. However, depending on the metric used, the diversity increase (expressed as a simple increase factor) ranges from 1.29 to 1.35. These values are substantially lower than the estimates obtained for the entire fauna (from 1.61 to 1.90).
A somewhat different approach is to restrict the data to samples that are overwhelmingly dominated by bivalves (>90% individuals). This strategy has an advantage of not artificially removing any fauna, while still focusing on a subset of data that is effectively a bivalve data set. The results are very similar to the previous analysis, with no noticeable changes in the Jurassic and a substantial drop in the late Cenozoic. The increase factor ranges across the four metrics from 1.42 to 1.49, again a range of values that explains a considerable amount of the diversity increase but still is substantially lower than the range of values estimated for the entire fauna (from 1.61 to 1.90).
Both analyses consistently suggest that the post-Paleozoic diversity increase observed in the samples is a combined effect of an increase in evenness of the bivalves augmented by the addition of gastropods.
The two intervals differ dramatically in the ecological composition of samples (Table 5, Figs. 4, 5). The Jurassic samples are dominated by sessile epifaunal and sessile semi-infaunal organisms (these two mode-of-life categories are grouped together throughout the analysis), whereas samples with notable presence of mobile or limited mobile organisms (also grouped together throughout the analysis) and fully infaunal forms are far less common (Fig. 4A). In contrast, the late Cenozoic samples are rarely dominated by sessile epifauna and semi-infauna, with most samples occupying the lower right corner of the graph (Fig. 4A), a region that represents samples overwhelmingly dominated by mobile (or limited mobile) infauna. Similar differences are observed for the dominant feeding mode, with the great majority of Jurassic samples being dominated by epifaunal suspension feeders, but with a greater spread of feeding modes in the Cenozoic (Fig. 5A). These differences are highly significant, as demonstrated by both nonparametric and parametric tests (Table 5).
The ecological differences mostly persist when data are restricted to bivalves (Table 5, Figs. 4B, 5B). In particular, the shift from the sessile epifauna and semi-infauna of the Jurassic to more mobile, fully infaunal organisms of the late Cenozoic is virtually identical when data are restricted to bivalves (Fig. 4), although mean and median sample proportions of infaunal individuals are somewhat higher for both time intervals when data are restricted to bivalves (Table 5). Most likely, this increase reflects the removal of epifaunal brachiopods from the Jurassic samples and the removal of gastropods (abundant gastropod species often represent grazing epifaunal forms) from the late Cenozoic data set.
The only ecological parameter that changes its pattern, when data are restricted to bivalves, is the proportion of suspension-feeding organisms (Table 5, Fig. 5). In both time intervals, bivalves are dominated by suspension-feeding forms, and only for the samples dominated by infauna can the presence of non-suspension feeders (mostly deposit-feeding forms) be more notable—note distinct triangular distribution of samples for both the Jurassic and late Cenozoic (Fig. 5B). The median and mean proportions of suspension feeders are similar for both time intervals and statistically indistinguishable, using both parametric and nonparametric tests. This pattern contrasts with whole-fauna analyses (Table 5, Fig. 5A), where non-suspension-feeding organisms—mostly represented by various feeding modes among gastropods—are an important component in many late Cenozoic samples.
Evaluation of Possible Causative Mechanisms
Taken at face value, the results suggest that (1) average sample-level standardized diversity of most common species of marine benthic invertebrates increased from the Jurassic to the late Cenozoic; (2) higher taxonomic composition changed from bivalve-brachiopod to bivalve-gastropod associations; (3) the observed increase in diversity is a combination of an increase in bivalve diversity/evenness and increased presence of diverse/equitable gastropod fauna; and (4) the changes in higher taxonomic composition and sample-level diversity were paralleled by notable changes in ecological characteristics of faunas (from a predominately sessile suspension-feeding epifauna and semi-infauna of Jurassic associations to predominately mobile infauna with various feeding modes in the late Cenozoic).
These results, consistent with numerous previous studies, may reflect a macroevolutionary trend in diversity coupled with a long-term macroecological shift in the dominant mode of life (e.g., Vermeij 1977, 1987, 1995; Bambach 1983, 1999). The observed trends also may reflect fundamental differences between Mesozoic marine ecosystems frequently affected by anoxic/dysoxic events and the late Cenozoic oceans where such events were rare (Sageman and Bina 1997; Jacobs and Lindberg 1998). However, four non-biological causative explanations can be also postulated a priori, with each of those explanations accounting for some (or even all) of the trends observed in our data:
Idiosyncratic Sampling Heterogeneities.—An artificial trend (specific to our data) generated by a temporal shift in latitudinal/climatic and environmental coverage of the samples—a shift toward more tropical environments or higher-diversity environmental settings through time could result in apparent changes in diversity and also account for changes in higher taxa and dominant ecology.
Aragonite Bias I.—A taphonomic bias due to an increased loss of species with original aragonitic shell in the older fossil record (see also Cherns and Wright 2000; Wright et al. 2003; Kidwell 2005)—a taphonomic loss of predominantly aragonitic gastropods as well as many aragonitic bivalve species in the Jurassic could artificially suppress diversity, alter higher taxonomic structure of samples, and depress the abundance of mobile infauna.
Aragonite Bias II.—A taphonomic bias due to poorer (mold/cast) preservation, and therefore lowered taxonomic resolution, of species with original aragonitic shells in the older fossil record—cast and mold preservation could lower taxonomic resolution of species with original aragonitic morphology and suppress sample-level diversity and evenness of Jurassic samples (this bias is unlikely to have generated apparent ecological trends as mode of life can often be inferred even if specimens are identified only at the genus or family level).
Diagenetic Bias.—Whereas our Late Cenozoic fossils are derived, with very few exceptions, from unconsolidated and sievable sediments, our Jurassic samples came from lithified rocks. These differences have both a taphonomic ramification and a methodological corollary (Cooper et al. 2006). First, because of diagenetic processes such as dissolution and compaction, small and/or thin-shelled specimens and species may be lost. Second, the sampling methodology differs notably between lithified and unlithified deposits. Indeed, in our specific case, the Jurassic samples were derived by mechanically breaking up of lithified rocks and, less frequently, by systematic surveys of bedding planes in the outcrop. In contrast, the late Cenozoic data were generated via exhaustive sieving of unconsolidated sediments. The latter method is likely to allow for capturing smaller specimens (and species). Because the methodological filter and the taphonomic filter have the same causative root and because their predicted consequences are comparable, it is difficult to evaluate them separately. Fortunately, given their congruent prediction (a bias against small and/or thin-shelled specimens), they can be evaluated jointly as exemplified below.
These four potential artifacts are evaluated one by one in the four subsequent sections.
Idiosyncratic (Paleolatitudinal and Environmental) Heterogeneities
The diversity increase observed in initial results may be due, partly or entirely, to idiosyncratic heterogeneities specific to our data sets. If Jurassic and late Cenozoic samples differ notably in their paleolatitudinal or environmental coverage, the observed differences may be a spurious byproduct of differences in sampling coverage rather than a meaningful reflection of real trends through time.
Most Jurassic samples and all late Cenozoic samples represent a comparable range of absolute paleolatitude (30° to 55°, with the Jurassic samples representing both hemispheres and the Cenozoic samples restricted mainly to the Northern Hemisphere), although the paleolatitudinal coverage differs subtly between the two time intervals when examined at a finer geographic resolution (Fig. 6A versus 6D). The Jurassic samples are slightly more equatorial, but this difference is subtle (only few samples come from sites below 30° and the mean absolute paleolatitude of the two data sets differs by only 3°: Jurassic = 36.0°, late Cenozoic = 39.3°). Nevertheless, comparable paleolatitudinal coverage does not automatically guarantee comparable climatic coverage of the two data sets—the greenhouse Earth of the Jurassic and the icehouse Earth of the late Cenozoic may have had different latitudinal climatic gradients. It is, therefore, reassuring that the comparison of mean diversity of Jurassic and late Cenozoic samples binned by absolute paleolatitude (Fig. 7) shows that, in both the Jurassic and the late Cenozoic, the variation in diversity across paleolatitudes is minor relative to the differences observed through time. Moreover, at any given paleolatitude, the offset between the mean diversity of the Jurassic samples versus that of the late Cenozoic samples is comparable to that observed for the pooled data and remains statistically significant (note narrow standard errors) despite the drop in sample size.
The reverse paleolatitudinal trend (increase in diversity toward higher paleolatitudes) observed in the late Cenozoic (Fig. 7) should not be interpreted literally. When binned by paleolatitude the resulting subsets of samples vary notably in their environmental and lithologic coverage so the observed trends are, at least partly, an artifact of environmental heterogeneities in our data. For example, the late Cenozoic samples with absolute paleolatitude >45° include a much higher proportion of offshore samples (42%) than the samples with absolute paleolatitude below 45° (29%). Because offshore samples tend to be more diverse in the case of our data (Fig. 7), the reverse paleolatitudinal trend may be due to variable proportion of offshore samples across paleolatitudinal bins. Binning simultaneously by paleolatitude, environment and lithology makes within-bin sample sizes too small for any meaningful analysis, and therefore, the exact causes for subtle spatial variation in diversity cannot be dissected unambiguously. However, the primary goal here was to show that—regardless of how data are binned—the combined effect of spatial variation is minor relative to differences through time.
The environmental coverage is nearly identical for the two time intervals (Fig. 6B versus 6E); in both cases onshore samples constitute about two-thirds of all samples (Jurassic, 68.3%; Cenozoic, 73.3%) (Log Likelihood G = 0.91, p = 0.34). When samples are binned by environments, the difference between the Jurassic and the Cenozoic remains comparable to the results for pooled data (Fig. 7). The lithological coverage is dominated by coarse clastics (sands and sandstones) for both time intervals (Fig. 6C,F), but the dominance of sandstones is significantly higher in the Cenozoic (72% versus 52%) (Log Likelihood G = 14.4, p = 0.0007; whether this interesting difference reflects some real shifts in facies through time or is an accidental outcome of differential sampling cannot be evaluated at this time). The average sample diversity varies across lithologies. However, as in other cases, the difference between the Jurassic and the Cenozoic remains significant when data are grouped by lithology (Fig. 7). Moreover, the sandstones, which display lowest sample diversity on average, are more common in the Cenozoic, whereas the siltstones having highest mean diversity values are more common in the Jurassic. Consequently, if anything, lithological variations suppress the diversity increase observed in the pooled data, although it should be noted that the temporal diversity increase of a similar magnitude is observed when each of the two clastic lithologies is analyzed separately (Fig. 7).
In sum, the comparison of Jurassic and Cenozoic data indicates that (1) the two data sets do not differ dramatically in their paleolatitudinal, environmental, and lithological coverage (Fig. 6); and (2) diversity fluctuations observed across paleolatitudinal, environmental, and lithological gradients within each of the two compared time intervals are very minor relative to the differences observed through time (Fig. 7).
Aragonite Bias I: Complete Loss of Specimens
Given that Jurassic samples predominantly come from lithified units and are typically affected by loss of aragonite—with originally aragonitic shells often preserved as molds or casts—the observed change in diversity may simply reflect the taphonomic loss of aragonitic taxa (see also Cherns and Wright, 2000; Wright et al. 2003; Bush and Bambach, 2004).
Replotting the sample diversity data (Fig. 1A) as a function of the per-sample percentage of specimens belonging to aragonitic species (Fig. 8) indicates that most Cenozoic samples, which nearly all preserve original aragonitic shells, are dominated by aragonitic specimens. In contrast, Jurassic samples represent a relatively even coverage of samples in respect to the percentage of originally aragonitic specimens (Fig. 8). This difference between the Jurassic and Cenozoic may represent either (1) a real difference in mineralogical composition of dominant species, with calcitic and bimineralic species being relatively more common in the Jurassic, or (2) a taphonomic signature, where numerous Jurassic samples have lost a large fraction of their aragonitic specimens.
In terms of diversity, in the Cenozoic, the samples strongly dominated by aragonitic specimens (samples with >85% aragonitic specimens) do not differ substantially in their average sample diversity from samples that include a notable proportion of bimineralic and calcitic taxa (Table 6, Fig. 8). In contrast, the Jurassic samples are distributed along a broad arch (Fig. 8), with the intermediate samples, in which both aragonitic and calcitic specimens are quite common, having elevated sample diversity. This arch may preserve a remnant of an original ecological signal reflecting the fact that higher-diversity fossil assemblages were generated from time-averaging of Jurassic communities in which both aragonitic and calcitic forms were common. It is difficult to attribute this pattern to loss of aragonitic specimens without invoking more-complex taphonomic scenarios. If loss of aragonite removed the most abundant species and thus increased the evenness (and effective diversity captured at small sample sizes), the standardized diversity could have spuriously increased (e.g., if an aragonitic taxon made up 95% of a fossil assemblage, an average sample of 100 specimens taken from such an assemblage would have had six species or fewer, but if this dominant aragonitic species were lost during fossilization then the sample of 100 specimens could have included many more species). This scenario does not explain, however, why diversity drops again among samples dominated by calcitic species. Perhaps this is because these are brachiopod-dominated samples. On the other hand, the artificial removal of brachiopods discussed above (see Table 4) does not result in any substantial increase in diversity of Jurassic bivalves. Thus, an intriguing pattern suggestive of slightly higher diversity in mineralogically mixed Jurassic samples remains unexplained at this time.
However, regardless of the ultimate reasons for the arch-like pattern, the aragonite loss can be tested directly from our data by restricting our Jurassic and late Cenozoic data sets to only those samples that are strongly dominated by aragonitic forms. We have arbitrarily chosen the value of 85% specimens as a threshold level. This cutoff value is low enough to maintain a reasonable number of Jurassic samples and high enough to ensure that the restricted samples are not affected by aragonite loss (arguably if 85% of specimens came from aragonitic fauna, the taphonomic loss could not have been very significant). The resulting comparison (Table 6, Fig. 8, inset plot) indicates that the diversity difference between the Jurassic samples and the late Cenozoic samples is maintained (in fact, at the species level, the difference increases from 165% (or an increase by a factor of 1.65) observed for all data to 182% observed for aragonite-dominated samples. This difference persists when data are grouped into 85–90%, 90–95%, and >95% aragonite specimens bins (Fig. 8, inset). The 5% bins below 85% are not plotted on the inset because the number of late Cenozoic samples becomes too small for bins below that threshold value to allow for meaningful comparisons.
However, the observed difference persists when samples are restricted to those with less than 85% of aragonite specimens (Table 6), although the increase is slightly lower than for pooled data (152%). Moreover, even when the data are restricted to the Jurassic “arch region” (40–85% range), which corresponds to a zone of most-diverse Jurassic samples and least-diverse Cenozoic samples (Fig. 8), a discernible difference is still observed. In this case, the diversity increase between the Jurassic and the late Cenozoic becomes notably smaller (120% for species and 118% for genera), but this diminished offset still remains significant statistically (Div30SP: Z = 2.95, p = 0.004; Div30GEN: Z = 2.70, p = 0.007; Wilcoxon two-sample two-tailed test with normal approximation).
Aragonite Bias II: Mold and Cast Preservation
Although the loss of aragonite can be ruled out as an important driver of the observed diversity increase, such loss may have indirectly lowered the diversity of Jurassic samples. That is, the aragonite-dominated samples, which typically consist of molds and external casts, may have had many more species and genera lumped into a single “unidentified” species (or genus). However, whether diversity loss can indeed be induced by cast/mold preservation is open to debate, especially in the case of external molds, from which latex casts recording many key morphological characters of invertebrate shells can be produced (e.g., Aberhan 1998).
This problem can be tested directly from our data. As summarized in Table 6, the average per-sample proportions of identified species and genera are very high for both the Jurassic and late Cenozoic: in all comparisons >85.5% of species and >99.0% of genera were identified (Table 6). Species-level percentages are slightly lower for Jurassic than for late Cenozoic species (depending on the comparison that offset varies from 2.1% to 7.8%; see Table 6). These species-level differences in taxonomic resolution are, in most cases, statistically indistinguishable (see Table 6 for details). More importantly, these differences are insufficient to account for the notable increase in species diversity observed in our data. For example, even if we assume that all species are equally abundant in the samples (the suppressing effect of unidentified taxa on standardized sample diversity would be potentially strongest in the case of perfectly equitable samples), the standardized Jurassic sample of 30 specimens would yield an average of 25.8 species. That is, for a sample of 30 specimens, the 14.5% unidentified Jurassic species (based on the lowest estimate of 85.5% identifiable species reported in Table 6) will translate into an artificial loss of 4.35 species (30*0.145). The minimum possible species loss for the Cenozoic (6.7%) would translate into 2.0 species, yielding thus the standardized diversity of 28.0 species. This deliberately maximized bias (the highest possible loss in the Jurassic versus the lowest possible loss in the Cenozoic) still generates a loss of only 2.2 species at n = 30, and thus cannot account for a difference of 4.55 species observed between the Jurassic and the late Cenozoic samples (Table 6). Moreover, the estimate of 2.2 species is the unrealistic, worst-case scenario because (1) real samples usually depart far from perfect evenness, (2) most of unidentified species belong to rare but identified genera, and (3) most genera are monospecific at sample level. Thus, we may expect that this effect is even more negligible when applied to real data. Finally, the genus-level analysis (Table 6) does not suggest any notable loss of taxonomic resolution due to cast/ mold preservation: the percentages of identified genera are virtually identical for Jurassic and late Cenozoic genera (within 0.6%). In sum, whereas there may be a minor loss of taxonomic resolution due to cast and mold preservation in the Jurassic, this problem affects only species analysis and even in that case cannot be blamed for the magnitude of the diversity increase observed through time. Admittedly, it is theoretically possible that genera and species defined on the basis of cast/mold specimens are not equivalent to those defined on the basis of fully preserved shells. However, Jurassic species and genera are generally based on fully preserved type specimens. Molds and casts are used only to identify those taxa, and not to define them.
In addition to taphonomic biases induced by aragonite loss, the observed increase in sample-level diversity may also have been induced by secular changes in the temporal resolution of fossil samples through geological time. Increased time-averaging (temporal mixing) of fossil assemblages may have caused temporal resolution of paleontological samples to decrease throughout the Phanerozoic (Kidwell and Brenchley 1994; Kowalewski and Bambach 2003; and references therein). An increased time-averaging of samples may have resulted in elevated evenness and a spurious temporal increase in evenness and richness of common taxa (see also Powell and Kowalewski 2002; Scarponi and Kowalewski in press). Because we cannot quantify the potential effect of increased time-averaging as a biasing factor—nor can we assess by how much (if at all) the levels of time-averaging changed between the Jurassic and the late Cenozoic— this issue cannot be addressed satisfactorily at this time.
The final artifact that may undermine any evolutionary interpretations of the observed patterns relates to differential diagenesis: most Jurassic fossils are derived from lithified deposits whereas most late Cenozoic fossils are collected from still unconsolidated sediments. As discussed above, in addition to the problem of aragonite loss (addressed above), these differences may have induced preferential loss of thinner and smaller specimens in the older fossil record, either due to their real taphonomic loss or due to differences in the sample acquisition methodology.
Specifically, in the context of this study, the difference in the diagenetic grade and degree of lithification may have resulted in the following difference between the two compared time intervals: our Jurassic samples do not preserve—or, are unlikely to have effectively captured—small specimens (according to our field and laboratory experience, specimens <5 mm are rarely noted during field surveys or recovered in the laboratory during processing of lithified rocks), whereas specimens around, and even below, 1 mm are present in the processed sieved material used to generate our late Cenozoic data. Regardless of whether small thin-shelled specimens of Jurassic age had been completely lost due to diagenesis or had been preserved but missed because of sampling methodology, the resulting bias could have caused spurious changes in diversity, higher taxonomic structure, and ecological composition of the samples.
We say “changes” rather than “a decrease” because the loss of small specimens does not automatically imply a decrease in standardized diversity and evenness. It is theoretically possible that the removal of small abundant species could increase evenness and result in elevated estimates of sample-standardized diversity (see also Kidwell 2001). For this and other reasons it is difficult to assess numerically these predictions in any theoretical manner and conclusively evaluate whether the diagenetic bias indeed can account entirely for the observed changes in diversity, higher taxonomic structure, and ecological composition of the samples. However, empirical tests can be designed to evaluate directly the effect of diagenesis and lithification-induced differences in methodology, or both. Given the data available to us, three independent empirical tests can be carried out at this time.
Test 1: Cassian Lagerstätten
In the Ladinian-Carnian (Triassic) Cassian Formation exceptionally preserved fossils occur in marls that underwent only limited diagenesis (Fürsich and Wendt 1977), including a diverse gastropod fauna (e.g., Kittl 1891). At certain levels within this formation, a rich marine benthic fauna is preserved with its original aragonitic shell material (Scherer 1977). Thus, in terms of their diagenetic grade, these samples are more comparable to unlithified, aragonite-preserving sediments of late Cenozoic age than to diagenetically mature units that dominate our Jurassic data. We have analyzed 29 bulk samples, which were derived by exhaustive collection of fossils, including small specimens comparable to those that typify our late Cenozoic samples obtained by sieving.
The Triassic samples reveal diversity levels and a higher taxonomic structure that is much more comparable to our late Cenozoic data rather than our Jurassic data. The standardized diversity of the Triassic samples is indistinguishable statistically from the late Cenozoic data, but significantly higher than the Jurassic estimates (Table 7). Triassic samples plot over nearly the entire range of diversity values of late Cenozoic samples, which is quite remarkable given the small number of Triassic samples available to us (Fig. 9). Additionally, owing to a notable presence of gastropods, the higher taxonomic structure of the Triassic data (Fig. 9) is more similar to late Cenozoic samples rather than the Jurassic ones, although the Triassic samples are statistically distinct from both the Jurassic and the late Cenozoic samples (Table 7).
Thus, this test appears to support the diagenetic bias as at least a partial explanation for the observed secular increase in diversity and also suggests that the absence of gastropods in the Jurassic samples may be, at least partly, due to diagenetic alteration. However, two important caveats need to be stressed: First, the analyzed fauna represents a tropical setting (paleolatitude of 21°), and some of those samples are associated with reef structures—arguably, a habitat inherently conducive to supporting high-diversity benthic biota (Connell 1978; Kiessling 2005); and second, all 29 samples represent one specific region that may not be representative of typical Triassic benthic associations. Thus, the sobering implications of this test need to be treated with some caution.
Test 2: Lithified Pleistocene Deposits of the Gulf of California
This test evaluates primarily the potential effect of changes in methodology induced by lithification. A data set consisting of 16 macroinvertebrate samples was derived (Aberhan and Fürsich 1991 and unpublished data) from lithified, fine- to coarse-grained sandstones of Pleistocene age that crop out in the vicinity of Bahía la Choya along the coast of the northern Gulf of California and are interpreted to have formed primarily in very shallow sandy subtidal areas. The sampling methodology was analogous to that used for deriving Jurassic samples included in our data set, making these 16 late Cenozoic samples methodologically comparable to our typical Mesozoic samples.
As summarized in Table 7, this Pleistocene data set is much more similar to the late Cenozoic than to the Jurassic samples. The diversity, higher taxonomic structure, and ecological composition of the data set are all indistinguishable statistically from the late Cenozoic samples (especially when more appropriate nonparametric tests are used; see Table 7), but differ dramatically from the Jurassic samples.
Thus, this test appears to reject the methodological bias as an important factor: the lithified Pleistocene data are comparable to the late Cenozoic data despite being affected by the same type of methodological attributes that typify Jurassic samples. However, as in the case of Test 1, caveats need to be considered here: First, through the Quaternary the northern Gulf of California was dominated by warm (effectively tropical) water masses and represents a unique macrotidal setting. Second, all 16 samples represent one specific depositional setting that may not be representative of typical late Cenozoic benthic associations. And finally, the lithified Pleistocene samples are not equivalent diagenetically to the lithified Jurassic samples, which, in many cases, underwent burial and passed through multiple diagenesis-related taphonomic filters. Thus, this test also needs to be treated with caution.
Test 3: Sieving Simulation
A final test can be conducted by simulating sieve effects using a data set of 17 Miocene samples from Europe for which all specimens (n = 3869) were measured in terms of shell size (see Hoffmeister and Kowalewski 2001 and Kowalewski and Hoffmeister 2003 for details). The effect of loss of small specimens, which could have been induced by diagenesis or sampling methodology in the Jurassic, can thus be simulated here for the late Cenozoic by removing specimens below a given threshold size and then computing standardized diversity and proportion of gastropods for each sample. The simulation was performed here at 1-mm increments from a 1-mm up to a 9-mm threshold.
The effect of small-specimen loss on diversity estimates could be estimated for 12 samples (Fig. 10) that had a sufficient number of large specimens to compute standardized diversity at n = 30 after removing small fossils. The samples vary in their individual behavior. As increasingly larger specimens are removed from the data, the standardized diversity of individual samples increases, decreases, or remains relatively stable all the way up to the 9 mm threshold. When changes are averaged across samples (Fig. 10, inset), a clear decrease in average standardized diversity occurs around 6 mm. However, this change is relatively minor (a factor of 1.2), and it cannot be used to explain the increase by factor of >1.6 observed between the Jurassic and Cenozoic data sets. In addition, the proportion of gastropods in the samples is not affected notably by removal of small specimens, either at the level of individual samples (Fig. 11) or for the averaged trend (Fig. 11, inset). Only after all specimens below 9 mm are removed from the samples can a drop be observed, but that decrease is relatively minor (<20%).
Note that the sieving test does not evaluate whether the diagenetic bias affected our data. Instead, it simulates what might have happened had the bias been so extreme that all specimens below 9 mm were either removed by diagenesis or missed during sampling. The results show that downgrading of our Cenozoic samples to large specimens cannot account for the observed increase in diversity or for the elevated presence of gastropods in the late Cenozoic samples. In other words, diverse mollusks including many gastropods are sufficiently represented among larger late Cenozoic fossils to retain diversity levels and gastropod-rich taxonomic structure after all small specimens are removed from the data.
As with the above two tests, caveats apply. First, the 17 samples have limited spatio-temporal coverage (Miocene of Europe) and may not be representative of the typical late Cenozoic samples (although the standardized diversity of those samples [Fig. 10] and per-sample proportions of gastropod specimens [Fig. 11] fall well within typical values of the late Cenozoic samples [Fig. 9]). Second, the Jurassic associations may have had a different body-size distribution across and within species than the late Cenozoic samples (e.g., if Jurassic gastropods had been smaller than the Cenozoic ones, the results of this simulation may not be realistic). Unfortunately, specimen size data are available only for 17 samples used in the test. Nevertheless, the test is important in showing that we cannot transform the diversity structure and higher-level taxonomic composition of the Cenozoic samples into that of the Jurassic samples by removing small specimens.
The evaluation of the four non-biological mechanisms, which can be postulated for explaining the observed secular trend, allows us to reject three of them (idiosyncratic heterogeneities in paleolatitude, paleoenvironment and lithology; aragonite bias I; and aragonite bias II) with a reasonable degree of confidence. However, Diagenetic Bias, which potentially combines both methodological and taphonomic filters, remains unresolved, as the three tests available to us failed to provide a coherent verdict: two reject the importance of the bias whereas one supports it.
Nevertheless, the results and analyses presented in this section are useful in three ways by (1) ruling out three out of four potential biases and artifacts that may have affected the observed secular trend; (2) identifying diagenetic bias as the key remaining challenge for interpreting the biological veracity of the post-Paleozoic increase in sample-level diversity of marine benthos (this result also constrains considerably the general conclusions presented below); and (3) illustrating a spectrum of tests that can be used to evaluate diagenetic bias (note that most of the undermining caveats can be dealt with in future studies by collecting targeted data tailored specifically for carrying out the above tests).
The standardized comparison of the Jurassic and late Cenozoic data sets revealed fundamental quantitative differences in sample-level diversity, higher taxonomic structure, and ecological composition of marine benthic invertebrates derived from fossil bulk samples. When summarizing these results it is important to explicitly separate “pattern” conclusions regarding what we observe in the fossil record from “process” interpretations regarding the underlying causative mechanisms.
Thus, the conclusions regarding the observed patterns (regardless of underlying processes and potential biases) can be summarized as follows:
The results agree with previous studies on sample-level diversity that evaluated long-term trends across the Phanerozoic and showed that evenness and the effective sample-level richness of common taxa may have increased through time (Bambach 1977; Powell and Kowalewski 2002; Bush and Bambach 2004). The results are also consistent with many previous studies that used fundamentally different data—qualitative observations or quantitative compilations of global occurrence data and/or stratigraphic ranges—to show that taxonomic diversity has increased throughout the post-Paleozoic fossil record (e.g., Valentine 1969; Sepkoski 1981; Bambach 1999; but see Alroy et al. 2001). However, the diversity increase reported here is much less prominent than the three- to fourfold increase estimated in many previous sample-level and global-level studies. Thus the results agree qualitatively with previous studies by documenting a significant increase in diversity, but they do not agree quantitatively;
The post-Paleozoic diversity increase is a combined effect of the increase in bivalve diversity and the addition of gastropods, whereas the loss of brachiopods has not played a significant role in generating the observed differences. This outcome illustrates the importance of dissecting the taxonomic components of diversity trends and generally agrees with previous studies that argued, either qualitatively or quantitatively, that gastropods (especially carnivorous neogastropods) diversified through the late Mesozoic and Cenozoic (e.g., Newell 1959; Taylor et al. 1980; Sohl 1987; Kosnik 2005);
A comparison of the Jurassic and late Cenozoic data sets shows that late Cenozoic fossil samples are characterized by much higher proportions of infaunal, mobile, non-suspension-feeding organisms than the Jurassic samples. These results are again consistent with many previous qualitative and quantitative studies that documented or argued for an increase in abundance (and/or diversity) of infaunal, mobile, and non-suspension-feeding species through time (e.g., Stanley 1968, 1977; Vermeij 1977, 1987, 1995; Ausich and Bottjer 1982; Thayer 1983; Aberhan 1994; Bambach 1983, 1993, 1999; Aberhan et al. 2006);
In terms of processes and biases, the conclusions are as follows:
Internal differences in paleolatitude, paleoenvironment of deposition, and lithology between the Jurassic and late Cenozoic samples used in this study (i.e., idiosyncratic heterogeneities within our data) cannot explain the differences observed through time.
The post-Paleozoic increase in sample-level diversity cannot be blamed on the loss of species with original aragonite shell mineralogy (aragonite bias I). Although the massive loss of aragonite fossils is well documented in the pre-Cenozoic fossil record (Cherns and Wright 2000; Wright et al. 2003), our results suggest that the mold/ cast preservation of the originally aragonitic species is sufficient to compensate for this bias. This result is also consistent with the recent analyses by Bush and Bambach (2004), who arrived at similar conclusions using a different approach and a different data set (but see Alroy and Hendy 2005).
The loss of taxonomic resolution that might be associated with cast/mold preservation of originally aragonitic fossils is at most minor and cannot account for the observed changes in sample-level diversity.
The potential diagenesis-related artifacts (i.e., the taphonomic bias and/or the methodological bias) cannot be ruled out as at least a partial contributor to the observed post-Paleozoic changes in diversity, taxonomic composition, and ecology. Whereas two out of the three available tests suggest that the loss of small specimens cannot replicate the observed patterns, one test points to potentially dramatic changes that might result from diagenesis-related artifacts. Although all tests are undermined by caveats, it is noteworthy that the importance of diagenetic filters has been independently suggested as a potentially important factor in recent preliminary reports by other researchers (Alroy and Hendy 2005; Hendy 2005). Diagenetic alteration has also been shown to strongly reduce sample-level diversity in microfossil assemblages (Kiessling 2002).
Given the ambiguous results of diagenetic tests, the possibility that the observed changes represent actual biological changes in response to various abiotic and/or biotic forcing mechanisms remains a valid potential explanation for the observed patterns. Such genuine biotic changes have been predicted and/or argued for previously in numerous macroevolutionary studies on long-term diversity dynamics, ecosystem utilization, ecological escalation, and environmental changes in the global ocean (e.g., Sepkoski 1981; Vermeij 1987, 1995; Bambach 1983, 1993; Jacobs and Lindberg 1998; Kowalewski et al. 2005; and references therein).
This study narrows the list of process-oriented explanations for the observed sample-level patterns down to two alternative, testable hypotheses: (1) diagenetic bias and (2) genuine biological changes. Future studies should concentrate on the rigorous testing of those two rival models. Until then these two hypotheses remain among the most viable and likely mechanisms for explaining the long-term changes in sample-level diversity, ecology, and taxonomic structure observed in the marine Phanerozoic fossil record.
We thank the National Science Foundation for the financial support of the Paleobiology Database used in this study. M.K. thanks Alexander von Humboldt Foundation for financial support during his three-month research visit in the Institut für Paläontologie, Museum für Naturkunde, Berlin, Germany. John Huntley (Virginia Tech), Dave Jacobs (UCLA), Rowan Lockwood (William and Mary), and Jim Schiffbauer (Virginia Tech) offered useful suggestions during the preparation of this manuscript. Tom Olszewski and one anonymous reviewer provided numerous constructive comments that greatly improved the quality and clarity of this report. This is Paleobiology Database publication number 43.
- Accepted 23 March 2006.