Anthropological Science
Online ISSN : 1348-8570
Print ISSN : 0918-7960
ISSN-L : 0918-7960
Review
Adult age at death estimation: methods tested on Thai postcranial skeletal remains
LUCILLE T. PEDERSENKATE DOMETT
著者情報
ジャーナル フリー HTML

2022 年 130 巻 2 号 p. 147-159

詳細
Abstract

Scientific literature frequently reports that age-at-death estimation standards developed on European and North American populations are less effective when used on genetically distant populations. Ultimately, this paper aims to inform forensic anthropologists and bioarchaeologists of the most appropriate methods to use on Southeast Asian skeletal remains by evaluating studies that have tested the replicability and accuracy of adult age estimation methods on Thai target samples. Results show that methods using the pelvis recorded the highest accuracy of up to 93%, but only when broad age ranges are used (±2 SD). Most methods produced the least bias and inaccuracy in young adults but considerably underaged older adults. Overall biases and inaccuracies tended to be lower for males than females. The sternal rib end method showed the weakest correlation with chronological age. Methods that produced age prediction developed with regression analyses derived from the Thai samples produced standard errors ranging from 9.5 to 13.9 years (using vertebrae and femora). Most of these methods were deemed too imprecise to be useful in Thai forensic cases. The best way forward to understand the wide range of morphological variation is for future studies to evaluate the influence of body size, activity patterns, socioeconomic status, nutrition, and health on skeletal aging and how it differs between Thai and geographically distant populations.

Introduction

Age estimation is a crucial element in the analysis of human skeletal remains when building a biological profile, either to identify an individual in forensic cases or to establish mortality profiles of past populations. However, it is argued that the reliability of estimation is too dependent on the demographic profile of the Western reference samples from which methods were generally developed. The rate of bone remodeling and degeneration is known to differ between European, African, and Asian populations (Aiello and Molleson, 1993; Schmitt et al., 2002), yet Southeast Asian skeletal research has not yet received the same amount of consideration as Western populations (Cho, 2019; Go et al., 2019). There are several scenarios that have created increasing pressure to ensure that skeletal age estimation methods are sufficiently accurate and reliable for a Thai population. These include that in Thailand, each year, on average, at least 200 unidentified human remains are registered at government agencies (Central Institute of Forensic Science, n.d.) and, as of 2008, the Thai Tsunami Victim Identification and Repatriation Centre was still trying to identify almost 400 unidentified remains from the 2004 Boxing Day tsunami (United Press International (UPI), 2008). There has also been a recent upsurge in the number of archaeological excavations conducted in Southeast Asia and an interest in the mortality and health of these individuals.

Age estimation of unidentified adult human remains relies on standards that have used reference populations of known age, sex, and ancestry to correlate various signs of skeletal degeneration and remodeling to different life stages and their associated chronological age ranges. The most accurate age estimations will always be achieved using standards developed on a reference sample that is the same as the study (target) population, as skeletal growth and degeneration are non-uniform across time and regions due to complex relationships with genetics, environment, socioeconomics, and behavioral influences (Schmitt, 2004; Gocha et al., 2015). However, most adult age estimation methods universally in use today were originally developed on skeletal collections in Europe, North America, and South Africa. These methods still require further validation to test their reliability and accuracy on other populations, especially those geographically isolated from the reference sample, such as Southeast Asian populations.

Over the past six decades a number of studies have tested these adult age-at-death estimation methods on Thai skeletal remains. This collation of age estimation studies from domestic and international scientific journals and unpublished theses provides a quick reference guide for forensic and bio-archaeological experts to determine the effectiveness and reliability of each technique when used on Thai individuals, and in particular which methods are best for young adults or older adults, and each sex.

The Thai studies have drawn their samples from the modern population, within which there exists great genetic diversity due to high rates of migration and distinct ethnic indigenous groups (Benjavongkulchai and Pittayapat, 2018). This population’s biological and cultural diversity, and largely agricultural economy, means that it is likely that skeletal maturation and degeneration will vary in relation to other geographically and genetically distant populations, hence the need to verify the reliability of the age estimation methods. The Thai studies use samples consisting of either skeletal remains from curated collections or autopsied cadavers (Table 1). Thailand has two large modern skeletal research collections with documented age and sex. The first is the Forensic Osteology Research Centre (FORC) at the Faculty of Medicine, Chiang Mai University (CMU) in northern Thailand, and the second is the Khon Kaen University (KKU) Human Skeleton Research Centre (HSRC), which has body donors from the rural Isan region, northeast Thailand. Both skeletal collections represent individuals who had mostly lived in the 20th to early 21st centuries and were from low to middle socioeconomic groups (Traithepchanapai et al., 2016; Techataweewan et al., 2017, 2018). These curated skeletal collections are the first modern skeletal representations of this size, geographic location, and ancestry that are available for research. They, therefore, represent an important opportunity to thoroughly test, develop, and revise traditionally used age estimation methods that were developed on genetically distant populations.

Table 1 Sample used to test methods and investigate age-related skeletal characteristics (in order of publication date)
Reference Sample population Total sample size (male/female) Age range (years) Mean age (years) male/female
Schmitt (2004) Thai—FORC skeletal collection 66 (37/29) 20–60+ *
Namking et al. (2008) Thai—HSRC skeletal collection 200 (120/80) 18–94 *
Chanapa and Mahakkanukrauh (2011) Thai—FORC skeletal collection 200 (139/61) 35–95 71
Singsuwana et al. (2012) Thai—FORC skeletal collection 210 21–96 *
Tipmala (2012) Thai—FORC skeletal collection 236 20–96 *
Gocha et al. (2015) Thai—HSRC skeletal collection 88 (44/44) 20–97 48/53
Khomkham et al. (2017) Thai—FORC skeletal collection 48 (34/14) 20–89 *
Iamsaard et al. (2017) Thai—HSRC skeletal collection 454 (254/200) * 61/60
Suwanlikhid et al. (2018) Thai—FORC skeletal collection 250 (125/125) 22–89 58/62
Chompoophuen et al. (2019) Thai—CMU cadavers 71 (49/22) 25–92 52
Monum et al. (2019) Thai—CMU cadavers 40 (24/16) 16–88 58/54
Praneatpolgrang et al. (2019) Thai—FORC skeletal collection 400 (262/138) 22–97 66/66
Singsuwan et al. (2019) Thai—FORC skeletal collection 200 (98/102) 22–90 63/63
*  Information not provided or not included in English abstract.

HSRC, Human Skeleton Research Centre (held at Khon Kaen University, north-east Thailand); FORC, Forensic Osteology Research Center (held at Chiang Mai University (CMU), northern Thailand).

Reports of accuracy and reliability for age estimation methods currently lack a clear set of standards and this somewhat limits the comparability of results between studies (Garvin et al., 2012) (Table 2). Some methods present the results in terms of bias (the mean over- or underestimation of age) and inaccuracy (a measure of the mean sampling error when comparing estimated age to known age); others report in confidence intervals, standard deviations (SDs) from the mean or the percentage of individuals for which the known age fell within the SD of the mean, standard errors (SEs), or correlation coefficients (differences between the estimated and known ages).

Table 2 Comparison of measures of accuracy in each study
Measure of accuracy Reference Bone region Method Accuracy
Regression correlation and standard error Monum et al. (2019) Femur (aspartic amino acid racemization) Benešová et al. (2004) Male age SEE = 8.07 yrs (r = 0.912, r2 = 0.8322); combined sex age SEE = 11.01 yrs (r = 0.8316, r2 = 0.6916); female age SEE = 15.77 years (r = 0.716, r2 = 0.5136)
Chompoophuen et al. (2019) Femur (histology) Adapted from Yoshino et al. (1994), Martrille et al. (2009), and Pfeiffer (1998) r = 0.906, SEE = 8.26 (using combination of Pm.H.Ar, COL.B, and Lm.B.Ar); Pm.H.Ar stood out as being the individual variable most closely correlated with age (r2 = 0.733) with the lowest SEE of 9.91 years
Praneatpolgrang et al. (2019) Cervical, thoracic, and lumbar vertebrae Snodgrass (2004), Watanabe and Terazawa (2006) and a modified scoring system r = 0.801, r2 = 0.642, SEE = 9.506 (P < 0.01) (highest accuracy with scoring method of Snodgrass (2004) using female mean lumbar score)
Suwanlikhid et al. (2018) Lumbar vertebrae Adapted from Kacar et al. (2017), Van Der Merwe et al. (2006), and Watanabe and Terazawa (2006) r2 = 0.408 with an SEE of 11.686 years, P = 0.000 (highest accuracy using degree of osteophyte formation on the inferior surface of L1)
Accuracy and standard deviation Gocha et al. (2015) Auricular surface Osborne et al. (2004) 93.0% male, 88.1% female known age within ±2 SD of assigned phase mean (males had highest correlation rs = 0.581, P = 0.000)
Pubic symphysis Suchey and Brooks (1990) 88.6% male, 78.0% female known age within ±2 SD of assigned phase mean (males had highest correlation rs = 0.907, P = 0.000)
Auricular surface Buckberry and Chamberlain (2002) 81.4% male, 76.2% female known age within ±2 SD of assigned score mean (females had highest correlation rs = 0.643, P = 0.000)
Sternal end 4th rib İşcan et al. (1984, 1985) 66.7% male, 48.0% female known age within ±2 SD of assigned phase mean (males had highest correlation rs = 0.565, P = 0.001)
Schmitt (2004) Pubic symphysis Suchey and Brooks (1990) 36.1% male, 37.9% female accurately classified within ±1 SD of the reference phase mean
Accuracy Tipmala (2012) Pubic symphysis Modified Suchey and Brooks (1990). 86.4% left os pubis, 85.2% right os pubis
Singsuwan et al. (2019) Acetabulum Rissech et al. (2006) 71% accuracy estimated age within 12 yrs of known age; 66% accuracy within 10 yrs
Singsuwana et al. (2012) Auricular surface Modified Lovejoy et al. (1985) and Buckberry and Chamberlain (2002) and developed regression equations 56.4% accuracy with SE = 11 yrs (using left side); 67.8% accuracy with SE = 10.6 yrs (using right side)
Schmitt (2004) Auricular surface Lovejoy et al. (1985) 7% of individuals accurately classified within Lovejoy’s five-year classes

yrs, years; SD, standard deviation; SE, standard error; SEE, standard error of estimate; r2, coefficient of determination; r or rs, correlation coefficient.

Pelvis (pubic symphysis, auricular surface of the ilium and acetabulum)

Several different techniques for estimating age via the pubic symphysis and auricular surface have been developed over the decades, some of which have been tested on Thai samples, including the Suchey–Brooks method (Brooks and Suchey, 1990), which was developed on a reference sample of predominantly North American ancestry with a minority of European, South American, or Asian ancestry. Also tested on Thai samples was the original Lovejoy et al. (1985) auricular surface method which has gone through several revisions (Buckberry and Chamberlain, 2002; Osborne et al., 2004). The Lovejoy method was developed using the prehistoric (8th–11th century AD) North American Libben Cemetery skeletal sample, cadavers from several North American forensic cases, and also the Hamann–Todd skeletal collection, which is comprised of African Americans and European Americans from historic to modern periods. The acetabulum has also recently shown promise for use to estimate age, and a method developed by Rissech et al. (2006) on a Portuguese skeletal collection was tested recently on Thais (Khomkham et al., 2017).

Pubic symphysis

Schmitt (2004) was the first to apply the Suchey–Brooks pubic symphysis method (Brooks and Suchey, 1990) to a Thai sample. Schmitt (2004) reported that the results for the 20- to 39-year-old age cohorts should be disregarded for both methods as the sample size, especially for females, was inadequate. Schmitt (2004) found the Suchey–Brooks method tended to overestimate the age of Thai adults less than 40 years of age and underestimated age for older adults (Table 3). Brooks and Suchey (1990) noted that when they tested their original method on a modern North American sample it was more reliable for young adults (up to 40 years of age); after this age they observed a wide range of individual variability which produced wide age distributions. Brooks and Suchey (1990) also noted that female standard deviations were greater than males by 0.5–1.3 years, and standard deviations got progressively greater as age progressed (up to 12.4 years ± 1 SD). Similarly, in Schmitt’s study, bias and inaccuracy values tended to be higher for Thai females compared with males. Inaccuracy for adults aged less than 60 years ranged from 2 years to 17 years; however, in adults over 60 years of age, inaccuracy was as high as 32.2 years for females and 27.2 years for males. Even given that the Suchey–Brooks method has such broad and overlapping age ranges for each phase, Schmitt noted that in only 37% of the Thai sample did known age fall within 1 SD of the assigned phase mean age.

Table 3 Age estimation methods tested on pelvic region
Reference Bone region Method Accuracy Age overestimated Age underestimated Overall bias/inaccuracy (years) Minimum inaccuracy (years) [age group]
Males Females Males Females
Schmitt (2004) Pubic symphysis Suchey and Brooks (1990) 36.1% males, 37.9% females known age within ±1 SD of the assigned phase mean ≤ 39 ≥ 40 −14.5/17.2 −16.1/18.8 2.4 [20–29] 6.7 [30–39]
Auricular surface Lovejoy et al. (1985) 7% of individuals accurately classified within Lovejoy’s five-year age classes ≤ 29 ≥ 30 −17.8/18.3 −20.0/20.0 2.0 [20–29] 6.3 [30–39]
Gocha et al. (2015) Pubic symphysis Suchey and Brooks (1990) 88.6% males, 78.0% females known age within ±2 SD of assigned phase mean; more reliable in males and younger adults (<50 yrs) n/a ≥ 40 −7.8/9.2 −8.7/12.5 2.8 [30–39] 6.2 [20–29 and 40–49]
Auricular surface Osborne et al. (2004) 93.0% males, 88.1% females known age within ±2 SD of assigned phase mean; more reliable in younger adults (<50 yrs) ≤ 49 ≥ 50 −4.4/12.2 −5.7/12.2 5.6 [40–49] 2.9 [20–29]
Auricular surface Buckberry and Chamberlain (2002) 81.4% males, 76.2% females known age within ±2 SD of assigned score mean; more reliable for older adults (>50 yrs) ≤ 49 ≥ 50 11.2/14.5 5.1/15.4 5.8 [60–69] 10.7 [60–69]
Singsuwan et al. (2019) Acetabulum Rissech et al. (2006) 71% accuracy estimated age within 12 yrs of known age; 66% accuracy within 10 yrs ≤ 65 ≥ 66 −0.17/8.55 sexes combined 1.25 [31–35] sexes combined
Tipmala (2012) Pubic symphysis Modified Suchey and Brooks (1990) 86.4% accuracy left os pubis, 85.2% accuracy right os pubis; new age ranges for Thais: phase 1 = age range ≤21 yrs, phase 2 = age range 22–28 yrs, phase 3 = age range 29–34 yrs, phase 4 = age range 35–43 years, phase 5 = age range 44–54 yrs, and phase 6 = age range ≥55 yrs
Singsuwana et al. (2012) Auricular surface Modified Lovejoy et al. (1985) and Buckberry and Chamberlain (2002) and developed regression equation 56.4% accuracy with SE 11 yrs for left side: age = −0.465CSL2 + 14.65CSL − 29.67. 67.8% accuracy with SE 10.6 yrs for right side: age = −0.59CSR2 + 16.86CSR side: age 36.8

SE, standard error; SD, standard deviation; yrs, years.

The 2012 thesis by Tipmala (2012) (written in Thai with an English abstract) also tested the Suchey–Brooks method; however, Tipmala (2012) used multinomial logistical regression analysis to produce new age ranges for each phase (Table 3) with the intention of increasing age-estimate accuracy for Thai individuals. Side asymmetry was also tested, with accuracy determined to be 86.4% for the left os pubis and 85.2% for the right side. Compared to the age ranges per phase developed in the original Suchey–Brooks method, these newly adapted Thai age intervals for each phase are narrower and without overlap. This study shows that Thai skeletal maturation is delayed in the later phases where they reach phases III–V later than the North American Whites on whom the method was developed.

When testing the Suchey–Brooks method, Gocha et al. (2015) reported similar results to Schmitt (2004) wherein adults over 40 years tended to have age underestimated, and individuals under 40 years were usually observed with lower values of bias and inaccuracy, and had age overestimated (apart from females 20–29 years in the Gocha et al. (2015) sample; no females in this age group were present in Schmitt’s sample). In both studies, the overall results show females have greater bias and inaccuracy than males (Table 3). On average, age estimates differed from known age by approximately 10 years or more. Bias and inaccuracy increased to 25.3 years for females in the 70+ age group, but this was still less than the bias and inaccuracy of up to 32.2 years reported by Schmitt (2004) for adults over 60 years.

Auricular surface

Schmitt (2004) tested the Lovejoy auricular surface eight-phase method and observed that male age estimation showed less bias and inaccuracy compared with females, and age tended to be underestimated once individuals were over 30 years of age. With the auricular surface, the rate of bias and inaccuracy for adults over 40 years of age was higher than when the Suchey–Brooks method was tested (except for females over 60 years of age), with inaccuracy reaching a peak of 31.9 years for males (30.4 years for females) aged over 60 years. Schmitt (2004) established that only a very small number of Thai individuals (7%) were assigned to the correct age range, with the majority of the sample being incorrectly placed into younger age phases (20–49 years), even though almost 85% of the Thai sample had a chronological age of over 40 years. When Lovejoy et al. (1985) tested their own method on a combined-sexes subsample of the North American Hamann–Todd skeletal collection, they observed inaccuracy ranging from just 3.2 years for young adults, up to a maximum inaccuracy of 11.1 years (but with a bias of just 1.9 years) in the over 50 age group. Schmitt (2004) determined that repeatability of the method was exacerbated by the difficulty the observers faced in interpreting the description of some features as outlined in the original methods, particularly for the auricular surface. This problem has previously been discussed as an issue (Merritt, 2013).

In a 2012 conference paper, Singsuwana et al. (2012) presented a new age estimation scoring system and quadratic regression using the auricular surfaces from a Thai skeletal sample. The authors utilized a selection of the features used in the Lovejoy et al. (1985) method (transverse organization, surface texture, microporosity, apical change, and retroauricular area activity), in combination with a composite score similar to that proposed by Buckberry and Chamberlain (2002). Their first step was to individually assess each of the five features of the auricular surface to obtain a combined composite score for both the left and right auricular surfaces to then develop regression specifically for a Thai population. They found some of the feature descriptions developed by Lovejoy et al. (1985) (microporosity and density) were difficult to evaluate in the Thai sample, just as Schmitt (2004) had reported. Singsuwana et al. (2012) believe this was due to a difference in morphological characteristics between the Thai sample population and the Western reference population on which the method was developed. Singsuwana et al. (2012) noted that this sample was represented by more older adults than young. Using composite scores from the left side and the right side, they tested the new (Table 3) on a sample of 60 individuals. No statistically significant differences were observed between the left and right os coxae. Accuracy of the new was slightly greater when tested on the right side, at 67.8% with a standard error of 10.6 years. The left side produced an accuracy of 56.4% with a standard error of 11 years.

When the auricular surface of the ilium was examined by Gocha et al. (2015) using the method developed by Osborne et al. (2004), they found underestimation of age occurred in adults over 50 years of age. The least amount of bias was observed in the 40- to 49-year-old age group (overestimation by 0.2 years for females and 2.2 years for males) and the highest amount of bias was seen in the 70+ age group (underestimation by 27.8 years for females and 23.4 years for males). The overall results show that the level of inaccuracy was the same for both sexes (12.2 years) with age tending, on average, to be underestimated by 4.4 years for males and by 5.7 years for females. In comparison, when the Buckberry and Chamberlain method was also tested on the Thai sample by Gocha et al. (2015), age was underestimated from 50 years of age for females, but not until 70 years of age for males. The overall results of the Buckberry and Chamberlain method showed a similar level of inaccuracy between males and females, but overall bias was noticeably lower in females than in males (Table 3). Gocha et al. (2015) observed a decrease in inaccuracy and bias of adults older than 50 years, particularly in comparison to their results from testing both the Suchey–Brooks and the Osborne methods, and also the Lovejoy method tested by Schmitt (2004) on a Thai sample. Gocha et al. (2015) reported that the Suchey–Brooks and Osborne methods were more reliable for estimating age in younger Thai adults (<50 years), and the Buckberry and Chamberlain method was more reliable for older adults.

Gocha et al. (2015) went on to test several multifactorial combinations, including averaging point estimates from all three pelvic methods (combination A), and a combination of average point estimates from the Suchey–Brooks method and one of the auricular surface methods, which required using the Osborne method on younger adults (if the Suchey–Brooks method indicated the pubis was in phases I–IV), or the Buckberry and Chamberlain method for older adults (if the pubis was in phases V–VI) (combination B). Gocha et al. (2015) determined that combination B produced the least bias and inaccuracy of any of the individual methods tested alone. With both combinations, A and B, there was a tendency for overestimation of age for adults under 50 years and underestimation of age for adults above this age. Overall, both combinations also achieved marginally improved results for males compared with females. Gocha et al. (2015) found that combination A provided a reasonable level of accuracy between the ages of 40 and 59 years in both sexes, whereas combination B worked to a reasonable level of accuracy for adults up to 69 years of age, except for females aged 30–39 years for whom the bias and inaccuracy was almost double that for males. They also tried combining the average point estimates from all six methods, but the sample size was drastically reduced due to insufficient skeletal elements in some individuals. This combination was found to perform sufficiently only on adults in the 40- to 49-year-old age group. Testing on a larger sample may see an improvement in results. There were several hurdles faced by Gocha et al. (2015) in this study. They were constrained to a data collection period of just one week, restricting the sample size to 88 individuals, and leaving no time to test for intra-and interobserver errors.

Acetabulum

Khomkham et al. (2017) examined morphological features of the acetabulum as these changed with age using the steps outlined in the age estimation method of Rissech et al. (2006), to observe and score seven features on both the left and right sides of each individual. The authors did not observe any statistically significant differences in scores between the sexes or sides. For three of the features, they did find a significant correlation with age. These were the acetabular groove (the most significant correlation was in the left female acetabulum (r = 0.61)), acetabular rim porosity, and apex activity (most significant correlation in the left male acetabulum (r = 0.59 and r = 0.62, respectively)). To Khomkham et al. (2017) these results suggested there are at least some similarities in timing and changes to morphological features between this Thai sample and the reference sample of Portuguese males on which Rissech et al. (2006) developed their method. However, the other four features are weakly correlated with age and highlight that there are some population differences in growth and degeneration in this part of the pelvis.

The Rissech et al. (2006) acetabular method was also applied to a Thai sample by Singsuwan et al. (2019). A preliminary test with 88 individuals determined that there were no significant side differences, so the method was comprehensively tested using a sample of 200 individuals. Singsuwan et al. (2019) recorded no significant sex differences. In comparison to Khomkham et al. (2017), they observed significant correlation with known age for all seven morphological variables. Results showed that overestimation of age occurred in adults younger than 66 years, and underestimation occurred in adults over this age. Low levels of bias (over- or underestimation to a maximum of 4.4 years) were seen in young and mid-aged adults (21–46 years) and older adults (61–75 years). However, age was underestimated by 11.38 years in adults aged 86–90 years. Inaccuracy reached a maximum of almost 12 years for the age groups 56–60 and 86–90 years. Singsuwan et al. (2019) recorded an accuracy of 66% when estimating age to within 10 years of known age, up to a maximum accuracy of 71% within 12 years of known age. In comparison, accuracy was higher for the Portuguese sample on which Rissech et al. (2006) developed their method; in their sample accuracy reached 89% when estimating age to within 10 years of known age. Singsuwan et al. (2019) suggested refining the scoring system, finding in the Thai sample some differences in the degree of change in certain features compared to that reported by Rissech et al. (2006), such as inconsistencies in density of acetabular fossa activity and deeper grooves surrounding the rim. They could see that apex activity showed a clear progression of change with age, whereas the acetabulum groove showed high overlap between ages.

Thorax (sternal rib ends, clavicle, and vertebrae)

Observing age-related changes in the sternal ends of the ribs at the costochondral joint was conceived as an alternative to using methods developed on the pelvis or cranial sutures. Cranial sutures are not included in this review because many studies have shown this method to be highly inaccurate (Singer, 1953; Brooks, 1955; Hershkovitz et al., 1997; Ruengdit et al., 2020). The method that has been tested on a Thai sample is the İşcan nine-phase system (İşcan et al., 1984, 1985) using the fourth rib of White male and female cadavers from an American medical examiner’s office. Several pilot studies used the Thai samples to observe age-related changes to the medial articular surface of the clavicle or vertebrae to evaluate their potential to estimate age, and several other studies calibrated age estimation regression for the Thai population using a combination of different scoring systems.

Sternal rib ends

Gocha et al. (2015) were the only researchers to have directly applied the İşcan method (İşcan et al., 1984, 1985) on a Thai sample to estimate age via changes to the sternal ends of ribs. Low levels of bias and inaccuracy in the earlier phases showed that the method was more accurate overall for Thai males and young adults. However, adults of both sexes above the age of 40 years (phase V onwards) consistently had their age underestimated. İşcan et al. (1984, 1985) noted that their method was most reliable for young to mid-aged White North Americans up to 40 years of age. After this age the SD from the mean reached up to 11 years for males and 15 years for females, with very wide age ranges per phase. Gocha et al. (2015) did not recommend this method for use in a forensic context for Thais or other Southeast Asian populations due to a poor correlation between observed and documented chronological ages, particularly apparent for individuals >40 years. Above this age both bias (underaging) and inaccuracy was as high as 37.5 years for females in the 70+ age group, whilst for males it reached 30.6 years in the 60- to 69-year-old age group.

Differences between reference and target sample size and distributions would have an impact on mean ages and standard deviation rates (Loth, 1995; Yavuz et al., 1998). The small Thai sample size (55 individuals) in the study of Gocha et al. (2015) hampered a thorough examination of the performance of this technique as the number of individuals examined per sex/age group ranged from just one (females aged 20–29 years) up to a maximum of eight (females aged 40–49 years). The İşcan method would benefit from further testing and modification on a Thai sample of larger size, with an even representation of males and females in all age groups.

Clavicle

Iamsaard et al. (2017) seriated 454 clavicles from north-eastern Thais of the HSRC skeletal collection to closely observe and record surface typography of the medial articular surface of the clavicle as a way of providing a population-specific learning aid for medical and paramedical students. The age range of the sample was not provided but the average age of the sample was 60.69 years (± 14.36 years). Surface typography, as well as porosity and osteophyte formation, is an important feature to record in a population sample, as once epiphyseal fusion is completed in young adults, changes of the medial surface of the clavicle provide another way to estimate age for older adults (Falys and Prangle, 2015). The study by Iamsaard et al. (2017) did not record porosity or osteophyte formation, and instead chose to focus only on assessing surface topography, which Falys and Prangle (2015) determined was the trait most closely correlated with age. Therefore, the Falys and Prangle composite score method still needs to be tested on a Thai sample to ascertain accuracy of age ranges, means, and standard deviations for use on a Thai or other Southeast Asian population. The pilot study of Iamsaard et al. (2017) could only confirm that the types of medial articular surface (smooth, slight granulation, coarse granulation, nodule formation, undulating, and degenerative) observed in the European reference sample of Falys and Prangle were also observed in the Thai sample. Another study by Traithepchanapai (2014) confirmed that commencement of osteophyte growth occurred on the margin of the medial articular surface of the clavicle of Thai individuals of at least 39 years of age in both males and females (cited in Traithepchanapai et al. (2016)).

Vertebrae

Four studies focused on observing the prevalence and severity of vertebral osteophytes in Thai samples to aid in identifying potential symptoms in clinical cases, and to investigate their potential to estimate age. In the first of these studies, Namking et al. (2008) examined cervical, thoracic, and lumbar vertebrae to determine that osteophyte prevalence significantly correlates with increasing age and does so more significantly in males than females. Most frequently, osteophytes were observed in the lumbar vertebrae (73% of L4, 70% of L5, and 69% of L3), followed by the thoracic (50.5% of T11 and 49.5% of T10), and cervical (46% of C5, 44% of C6, and 38% of C4). The most prominent osteophytes were located on the anterosuperior aspect of the rim of lumbar vertebrae L3, L4, and L5.

Chanapa and Mahakkanukrauh (2011) studied only the cervical vertebrae in their northern Thai sample, recording the highest prevalence of osteophyte formation in vertebral bodies (49%), followed by facet joints (35%), and foramen (16%) of cervical vertebrae (C3–C7). Chanapa and Mahakkanukrauh (2011) concluded that osteophyte length significantly correlated with age, but not significantly with sex. The average length of C3 osteophytes were longer than on any other cervical vertebrae, but the maximum length of an osteophyte was recorded on the superior facet of a C4 vertebra (13 mm). Greatest osteophyte prevalence was observed in cervical (C5) vertebrae (83%), followed by C6 (77%), C4 (74%), C7 (65%), and C3 (64%). These prevalence values are much greater than that observed by Namking et al. (2008) in the northeastern Thai sample (C5, 46%; C6, 44%; and C4, 38%).

Suwanlikhid et al. (2018) used linear regression to estimate age from degenerative changes to lumbar vertebrae by observing and scoring three morphological features including changes to the cortical surface of the lumbar body, and the degree of osteophyte formation and macroporosity on the superior and inferior borders and endplates of the lumbar vertebrae. Suwanlikhid et al. (2018) produced an adaptation of several previously developed vertebral osteophyte scoring systems (Van Der Merwe et al., 2006; Watanabe and Terazawa, 2006; Kacar et al., 2017), as well as developing a new scoring system on the Thai sample for macroporosity and resorption of the cortical surfaces. Eight grades were used to score the degree of osteophyte formation with or without bridging and projections. Four grades were used to determine the degree of macroporosity on the superior and inferior surface of the vertebral body, and four grades were used to determine the degree of roughness with porosity on the cortical surface. All three features had moderate correlation with age, with the prevalence of osteophytes having the highest correlation of the three features, particularly on the inferior surface. Osteophyte formation was observed to commence around 26 years of age. The scores for each feature were used to develop new age estimation for each of the five lumbar vertebrae, producing 25 in total, with standard errors ranging from 11.7 to 14.5 years. The highest level of accuracy was gained from observing osteophyte formation on the inferior surface of the first lumbar vertebra (r2 = 0.408 with a standard error of 11.7 years), but even this was a weak correlation between actual age and estimated age.

Praneatpolgrang et al. (2019) calibrated age estimation based on examining vertebral osteophyte formation in a Thai sample using the five-grade scoring system developed by Snodgrass (2004) and the four-grade system designed by Watanabe and Terazawa (2006), as well as developing their own new six-grade scoring system focusing on changes to the rugosity of the surface of the inferior and superior margins of the vertebral body, osteophyte length, and fusion of adjacent vertebrae. Praneatpolgrang et al. (2019) separately scored cervical, thoracic, and lumbar vertebrae for three groups (males, females, and combined sexes). Significant correlation was found between known age and the scores for all parts of the vertebral column (cervical, thoracic, and lumbar). The mean lumbar score had the best correlation with age for all three groups in each of the three scoring systems. The correlation coefficient (r) tended to be valued above 0.75 (strong positive correlation), the r2 values were consistently between 0.53 and 0.64 (moderate positive correlation), and the standard error of estimates for mean lumbar scores were consistently between 9 and 11 years (P < 0.01). This is less error than recorded by Suwanlikhid et al. (2018) (standard error of estimate (SEE) = 11.7–14.5 years) discussed above. Praneatpolgrang et al. (2019) found that the results for all three scoring systems were similar and were suitable for use in Thai forensic cases. They reported that their new six-grade scoring system was more objective and faster to use than the Snodgrass and the Watanabe and Terazawa methods. However, the scoring system of Snodgrass (2004) overall produced the best results on this sample, with the most accurate of these regression recorded in the female mean lumbar score (Table 4) with a standard error of 9.506 (P < 0.01), r = 0.801, r2 = 0.642. The weakest correlation between age and vertebral osteophyte formation was generally found when using individual cervical or thoracic vertebrae in all three of the scoring systems.

Table 4 Age estimation methods tested on the thorax
Reference Bone region Method Accuracy Age overestimated Age underestimated Overall bias/inaccuracy (years) Minimum inaccuracy (years) [age group]
Males Females Males Females
Gocha et al. (2015) Sternal end 4th rib İşcan et al. (1984, 1985) 66.7% males, 48.0% females known age within ±2 SD of assigned phase mean; more reliable for males, especially <40 yrs ≤39 ≥40 −10.0/12.9 −13.01/18.5 3.8 [20–29] 6.4 [30–39]
Namking et al. (2008) Cervical, thoracic, and lumbar vertebrae N/A Osteophyte prevalence significantly correlates with increasing age, and more significantly in males than females
Chanapa and Mahakkanukrauh (2011) Cervical vertebrae N/A Osteophyte length significantly correlated with age, but not significantly with sex
Suwanlikhid et al. (2018) Lumbar vertebrae Adapted from Kacar et al. (2017), Van Der Merwe et al. (2006), and Watanabe and Terazawa (2006) Highest accuracy using degree of osteophyte formation on the inferior surface of L1 (r2 = 0.408 with an SE of 11.686 yrs)
Praneatpolgrang et al. (2019) Cervical, thoracic, and lumbar vertebrae Snodgrass (2004), Watanabe and Terazawa (2006), and developed a modified scoring system Mean lumbar score most accurate of all three scoring methods; highest accuracy with scoring method of Snodgrass (2004) using female mean lumbar score age y = 32.308 + 15.994x, with an SE of 9.506 (P < 0.01), r = 0.801, r2 = 0.642

SD, standard deviation; SE, standard error; yrs, years; r2, coefficient of determination; r, correlation coefficient.

Femur

All the previously discussed adult age estimation methods have been made via non-destructive qualitative or quantitative macroscopic observations. Histological methods and aspartic amino acid racemization are generally avoided as they require destructive sampling of bone, albeit a very small piece of the femur, humerus, or rib, usually approximately 2 cm2, to observe microstructural changes in the cortical bone.

A study by Chompoophuen et al. (2019) is the only one to examine histomorphometric age estimation using cortical bone sections of femora in a Thai sample. They used decalcified and stained bone sections, image analysis (ImageJ), and the computer program MATLAB to determine the correlation between age and pixel density of histological variables. Chompoophuen et al. (2019) used linear regression analysis to calibrate established predictive formulae to estimate age from five variables, and to develop one new variable for quantifying collagen measurements in bone (COL.B). They found that collagen in males had higher correlation with age (r = 0.800) compared with females (r = 0.467). In the combined-sex sample, measuring the perimeter of the Haversian canals (Pm.H.Ar) produced the highest correlation coefficient with age (r = 0.856), and produced from regression analysis showed that Pm.H.Ar stood out as being the individual variable most closely correlated with age (r2 = 0.733) with the lowest standard error (9.91 years) (Table 5). Stepwise multiple regression showed that, overall, a combination of three variables, Pm.H.Ar, COL.B, and Lm.B.Ar (percentage of lamellar bone area), provided the most accurate predictor of age (correlation coefficient of 0.906).

Table 5 Age estimation methods tested on the femur
Reference Bone region Method Accuracy
Chompoophuen et al. (2019) Femur (histology) Adapted from Yoshino et al. (1994), Martrille et al. (2009), and Pfeiffer (1998) Age = (−28.199 + 0.0138(Pm.H.Ar) + 0.00005(COL.B) + 9.312(Lm.B. Ar)), r = 0.906, SEE = 8.26. Pm.H.Ar stood out as being the individual variable most closely correlated with age (r2 = 0.733) with the lowest SEE (9.91 yrs)
Monum et al. (2019) Femur (aspartic amino acid racemization) Ohtani et al. (1998) and Ohtani and Yamamoto (2005) Combined sex age = (ln(1 + D/L)/(1 − D/L) − 0.0192))/0.0005), with SEE = 11.01 yrs, r = 0.8316, r2 = 0.6916; male age = (ln(1 + D/L)/(1 − D/L) − 0.0155))/0.0005), with SEE = 8.07 yrs, r = 0.912, r2 = 0.8322; female age = (ln(1 + D/L)/(1 − D/L) − 0.0236))/0.0004), SEE = 15.77 yrs, r = 0.716, r2 = 0.5136

D/L, dextro/levo; SEE, standard error of estimate; r2, coefficient of determination; r, correlation coefficient.

Monum et al. (2019) evaluated the aspartic amino acid racemization procedure suggested by Ohtani et al. (1998) and Ohtani and Yamamoto (2005) to predict age from femoral bone samples using aspartic amino acid racemization. Monum et al. (2019) found that the dextro/levo (D/L) ratio was highly correlated with known age in the Thai sample, with males showing better correlation (r = −0.912) than females (r = 0.716), but there were no statistically significant differences between rate of racemization and sex. The combined sex sample produced a standard error of 11.01 years. Monum et al. (2019) used the racemization results and linear regression to calculate age estimation for each sex and for sexes combined (Table 5).

Computed tomography scans

It should also be noted that data collected from routine computed tomography (CT) scans have shown great potential in evaluating age-related morphological changes in several Thai studies. Pattamapaspong et al. (2015) assessed the timing of fusion of the medial clavicle in CT scans of Thai patients to calibrate and slightly modify the classification methods of Schmeling et al. (2004) and Kellinghaus et al. (2010). They found that stage 4 of fusion best represents Thai individuals over 18 years of age. Pattamapaspong et al. (2019) then used a new cinematic volume render to produce three-dimensional CT images of the os pubis and auricular surface of individuals from the CMU skeletal collection to test the Suchey–Brooks and Buckberry–Chamberlain methods of age estimation. The authors found that the new technique has a high success rate when assessing features of the os pubis; however, most auricular surface features cannot be clearly seen in the CT scans and are best assessed in dry bone. A major limitation of CT scanning is that such technology is often only accessible in large laboratories or hospitals and is expensive. Technical training is required to use specialized and expensive imaging equipment and to assess images with specialized computer software. However, as CT scans can be collected from live patients of all ages, it overcomes the sample bias seen in all skeletal collections in which the very young and the very old are underrepresented.

Discussion

Several of the Thai studies (Schmitt, 2004; Gocha et al., 2015) tested for sexual dimorphism by measuring bias and inaccuracy. These studies showed that techniques using the pelvis and ribs tended to be more reliable for males and younger adults. For instance, overall bias was almost always observed to be less in males than for females (1–6 years difference), the exception to this was for the Buckberry and Chamberlain (2002) auricular surface method where overall female bias was 6 years less than for males. The overall degree of inaccuracy was also observed to be lower in males than females (1–6 years difference). Males and females will often experience bone remodeling at different rates (Lewis and Roberts, 1997; Cho et al., 2006) and females could have been exhibiting a greater degree of morphological variation of age indicators (Djurić et al., 2007). The use of a larger female sample might have assisted in better phase placement.

Bias and inaccuracy increase considerably with age for all methods tested on Thai samples. Methods that were more reliable for younger Thai adults (<40 years) included all the methods using the pubic symphysis, acetabulum, rib ends, and auricular surface. The exception to this was the Buckberry and Chamberlain (2002) auricular surface method which was more reliable for Thai adults over 50 years of age. This method was established using a reference group with a higher proportion of adults aged over 60 years so it is not surprising that it performed better on older Thai individuals, whereas the original Suchey–Brooks pubic symphysis method, the Lovejoy auricular surface method, and the İşcan rib method were developed on younger reference groups and have proven to be more accurate on adults aged less than 40 years (Berg, 2008; Merritt, 2013).

There was a prevailing tendency for bias, where age was overestimated in young adults and underestimated in older adults in the Thai studies. Research verifies this trend is a persistent limitation for methods using regression-based models to correlate morphological data with age (Aykroyd et al., 1999; Schmitt et al., 2002; Berg, 2008; Getz, 2020). Disagreement in size, age structure, and mean age between the reference group and target sample can amplify this bias (Bocquet-Appel and Masset, 1982). The youngest adults and those over 60 years of age at death are particularly underrepresented in the original methods (Lucy et al., 2002; Martrille et al., 2007; Merritt, 2013; Miranker, 2016). Most of the Thai samples had a mean age of between 50 and 60 years and underrepresentation of some age groups, usually 20–29 years and 70+ years. There is also often unequal representation of males and females in the Thai target samples (Table 1), with a bias towards more males. The Thai sample demographics have been impacted by the number of body donations and suitable autopsied cadavers accessible from university forensic departments and hospitals from which the study samples were obtained.

Gocha et al. (2015) determined a method’s accuracy by assessing the percentage of the sample whose documented age was within ±2 SD of the mean age reported for the phase or stage. The İşcan rib method showed the weakest correlation with known age, with high bias and a maximum accuracy of 67%, with Gocha et al. listing this as the least preferred skeletal indicator to use on Thai individuals. It must, however, be noted that the study was hindered by a small sample size in which young females in particular were underrepresented (Gocha et al., 2015). Greater accuracy (76–93%) was achieved in methods utilizing the pubic symphysis and auricular surface. Whilst 93% accuracy of the Osborne method sounds impressive, it should be noted that the age ranges assigned to an individual with ±2 SD are very broad with overlap between successive stages, making the estimated age ranges too impracticable to be of real use (Rogers, 2016). Wide age categories for each phase are usually required to capture the range of individual morphological variability experienced within a population (Djurić et al., 2007; Berg, 2008). The advantage of a wider age range is that fewer individuals are placed in the incorrect phase for age (Bocquet-Appel and Masset, 1982; Brooks and Suchey, 1990).

Schmitt (2004) preferred to use estimates ±1 SD within the mean to obtain a maximum accuracy of nearly 38% for the Suchey–Brooks method. However, this confidence level also suffers from the limitation that it may produce too narrow and precise an age range that cannot account for all individual morphological variation in the Thai sample (Rogers, 2016). This increases the probability that an individual will be placed in an age phase that is above or below the one in which their documented age actually falls (Garvin et al., 2012). Just 7% of individuals were correctly placed in the assigned age phase when Schmitt (2004) tested the Lovejoy auricular surface method. The low level of accuracy achieved with this method is a product of the 5-year age ranges published for the Lovejoy method, which are far too narrow to capture the full range of morphological variation of the auricular surface (Osborne et al., 2004). Even when Singsuwana et al. (2012) developed population-specific regression derived from Thai auricular surfaces, the maximum accuracy observed was nearly 68% but with a 10-year standard error.

Correlation coefficient values were highest when regression were derived from the Thai samples using either histological and biochemical (amino acid racemization) techniques or vertebral osteophyte prevalence in the lumbar vertebrae (r = >0.8) with standard errors ranging from 8 to 16 years. Monum et al. (2019) states that a standard error of 11 years is acceptable, whereas Rösing and Kvaal (1998) argue that standard errors that exceed 5–7 years should not be applied to forensic cases or archaeological contexts. Correlation over 0.8 is usually accepted as a strong level of correlation between morphological indicators and age, although Bocquet-Appel and Masset (1982) argued that values under 0.9 are likely to introduce considerable risk of error to age estimations.

Almost all the authors of the Thai studies in this review argued that interpopulation variation greatly reduced the reliability of the methods. Most either did not recommend the methods for use on a Thai population, particularly not for a forensic setting, or only if applied with a high degree of caution. This is not a situation unique to just this population. Current research predominantly indicates that bone degeneration and remodeling occur at non-uniform rates between individuals of Asian, African, and European origin (Buckberry and Chamberlain, 2002; Schmitt et al., 2002; Mays, 2012; Shilpa et al., 2013) due to varying genetic and environmental factors experienced between regionally diverse populations (Berg, 2008; Garvin et al., 2012). These studies argue that age estimation standards developed on one sample population therefore cannot adequately reflect the individual and population-level rate of bone remodeling and degeneration found in another population, necessitating population-specific standards to ascertain biological age.

In contrast, other researchers suggest using one large reference group made up of individuals from a number of genetically distant populations to improve the reliability of methods by capturing a greater level of skeletal variation (Brooks and Suchey, 1990; Zhang et al., 2009). Furthermore, some studies stress that ancestry is not the main cause of skeletal age variation as there should be similar skeletal growth and degeneration rates between populations when health and environmental conditions are similar (Schmeling et al., 2000). It is suggested that a combination of factors, including health and hormones, and socioeconomic status linked to body mass index (Ferguson et al., 1982; Schmeling et al., 2000; Mays, 2015) have more of an effect on the timing and rate of bone turnover than population affiliation. This is usually not clearly discussed as a consideration when a method of age estimation is chosen by anthropologists. There is a paucity of studies that compare how socioeconomic status, health, and physical activity impact the reliability of current age estimation methods between different populations. An individual’s level of physical activity and mechanical loading on the skeleton will affect the rate of bone turnover and bone mass (Adami et al., 2008); however, recent research on individuals of European and African American descent suggests that high levels of repetitive physical activity have little significant impact on rates of degenerative change to age-related features in load-bearing joints of the pelvis (Campanacho et al., 2012; Winburn, 2019; Bertsatos et al., 2021). Merritt (2015) was the first to show a clear relation between underestimation of skeletal age and short stature combined with low body mass, whereas overestimation of age occurred in tall people with high body mass, with evidence of increased bone surface degeneration in relation to weight increase (see also Wescott and Drew, 2015). Additionally, Merritt (2017) determined which age estimation methods were most reliable for smaller-bodied individuals and for larger-bodied individuals.

Health statistics in Thailand have identified a secular trend for height/weight increase correlating with improved living conditions since records began in 1975 (Jaruratanasirikul and Sriplung, 2015). Lower socioeconomic status in rural areas has caused delayed bone age in young Thai rural children compared with their peers from more affluent regions such as the United States and urban middle-class Thailand (Bailey et al., 1984). High-income (modern) Western populations are largely sedentary and their levels of physical inactivity are twice as high as those in low-income countries (Guthold et al., 2018). Clinical research has established there are significant differences in the relationship between skeletal muscle mass and age among Hispanic, African American, White American, and Asian (Chinese, Indian, Korean, and Japanese) individuals (Silva et al., 2010). Iamsaard et al. (2017) argued that the pattern of degenerative changes seen in the older age Thai adults of the KKU skeletal sample from rural northeast Thailand may be influenced by activities in labour-intensive manual rice agriculture, an occupation in which a majority of the individuals from this low to mid-socioeconomic region are likely to have participated (Techataweewan et al., 2017). Similarly, Tayles and Halcrow (2015) have considered the biomechanical forces involved in rice planting, which requires repetitive flexing at the hip joint, and its effects on age-related change in the auricular surface, and there may be influences from the cultural practice of a full squatting position commonly adopted by Thai people and other Asian populations. The impact that these factors have on skeletal age estimation in the Thai population is still poorly understood and needs additional research.

Conclusion

The Thai studies follow similar trends often reported for other populations using these same methods in that bias and inaccuracy always increase with age (often dramatically), accuracy is dependent on wide age ranges, and some methods are more reliable than others for young adults. This really reflects the statistical analyses utilized (e.g. regression based) and age structure of the reference group on which the methods were originally developed. The Suchey–Brooks, Lovejoy, Osborne, and Rissech methods produce less bias and more accuracy on Thai adults younger than 40–50 years of age, whereas the Buckberry–Chamberlain method was more reliable for adults aged over 50 years. As such, the skeletal indicators with the highest accuracy were the pubic symphysis and the auricular surface, whereas the İşcan rib method was regarded as the least reliable.

However, when ±2 SD are used it should hardly come as a surprise that a method is deemed accurate, as the provided age ranges are so wide that 95% of estimated ages fall within the recommended phase or stage. Even with these weaknesses, well-established methods such as Suchey–Brooks, Lovejoy, and İşcan continue to be favoured due to their ease of use and popularity even though most were developed decades ago. Some of the newer or modified methods using the pubic symphysis, acetabulum, and rib ends still have not been tested on Thai samples. Lesser-known techniques such as vertebral osteophyte prevalence, histologic and biochemical methods have shown a high correlation with age in the Thai samples. But these methods are underutilized in current research and now is the time to further refine them on larger samples to discover their full potential. To fully understand morphological variation on an individual and population level, and to produce meaningful age estimates, emerging research needs to compare diet, activity levels, and health of Thais with other populations such as North American, European, and South African.

Disclaimers

Nil

Conflicts of interest

The authors declare no competing interests

Author contributions

L.T.P. wrote the review; K.D. was responsible for review concept and editing.

Acknowledgements

We are grateful to Dr. Nigel Chang and Dr. Anna Willis for providing critical review on early drafts of the manuscript. This study is supported by an Australian Government Research Training Program (RTP) Scholarship.

References
 
© 2022 The Anthropological Society of Nippon
feedback
Top