In October 2016, research published in Nature claimed that since the mid 1990s, the age of the oldest person has not increased. But others have been critical of the high-profile paper, leading to a series of exchanges of opinion among the authors and their critics in Nature this week.
In their original paper, geneticists Xian Dong, Brandon Milholland, and Jan Vijg – all from the Albert Einstein College of Medicine in New York City, NY – investigated trends in maximum lifespan, which is the greatest age that individuals can live to.
They looked at the maximum age at death in four countries between 1968 and 2006. These countries were France, Japan, the United Kingdom, and the United States.
According to their analysis, maximum lifespan increased until 1994, after which point it plateaued.
The authors concluded that the maximum average lifespan is around 115 years. No increase in this number has been seen since the mid 1990s.
They also used mathematical modeling to predict that the maximum age possible to be is 125, and they said that the probability of anyone exceeding this age is “less than 1 in 10,000” per year.
A news article published in Nature, posted the day before the paper was released, pointed to some of the criticism immediately voiced by others in the field of aging.
Points of criticism include disagreements over the interpretation of the data and a lack of any mention of the possibility of future medical advances influencing maximum lifespan.
An article published in the Dutch magazine nrc, entitled “Peer review post-mortem: how a flawed aging study was published in Nature,” examined the controversy surrounding the paper.
Senior author Vijg explained that the paper was originally rejected by Nature after the first round of peer review. However, out of the blue, he was asked by the editorial board at Nature to submit a revision.
Importantly, two of the peer reviewers – Stuart Jay Olshansky, a professor in the School of Public Health at the University of Illinois at Chicago, and Jean-Marie Robine, research director at INSERM, the French National Institute of Health and Medical Research – revealed that they did not examine the statistics in the paper in detail.
Yet one of the criticisms is that as a geneticist, Vijg and his team are used to analyzing large-scale genetic data, as opposed to demographics data.
The scientific community of demographers working on aging was shaken up. How could three geneticists find a conclusion in publicly available data that had eluded the field until now? Eminent demographers started sending their comments to Nature.
This week, five “Brief communications arising” articles were published in Nature. Each article is a critique by one group of scientists, and each is accompanied by a response from Vijg and his team.
There are three main themes that keep cropping up in the articles. These themes include those relating to the dataset and those of the statistical analyses carried out.
1. Choice of dataset
The main findings in the paper are based on combined data from the International Database on Longevity (IDL). But data weren’t available for each of the four countries for the entire time period.
Adam Lenart and James W. Vaupel – both of the Max Planck Odense Center on the Biodemography of Ageing at the University of Southern Denmark in Odense – urged caution when combining data in this way.
Maarten P Rozing, Thomas B. L. Kirkwood, and Rudi G. J. Westendorp – all from the University of Copenhagen in Denmark – added that it is not appropriate to use the same dataset to generate a hypothesis and then use it to test this hypothesis, as this has the potential to lead to “false assessment of statistical significance.”
This opinion was echoed by Nicholas J. L. Brown and Casper J. Albers, both from the University of Groningen in the Netherlands, and Stuart J. Ritchie, from the University of Edinburgh in the U.K.
Vijg and colleagues’ retort was that they used data from both the IDL and the Gerontology Research Group database, which holds worldwide data. Their conclusions stand when using data from both.
The problem is that both sources include the same individuals, meaning that the datasets are not independent.
Joop de Beer, from the Netherlands Interdisciplinary Demographic Institute, Anastasios Bardoutsos, from the University of Groningen, and Fanny Janssen, who is from both organizations, used a different dataset to argue that maximum life span will increase beyond the age of 115.
They predict that by the year 2070, around 1 in 840,000 Japanese women will survive to the age of 125.
But Vijg and colleagues argued that this does not contradict their conclusions, as they calculated the maximum possible age to be 125, and they acknowledge that outliers above the average maximum age of 115 are possible.
They further said that the mathematical model used by de Beer and colleagues is not appropriate.
For people who don’t use a lot of complicated mathematics in their everyday lives, the world of statistics can be a minefield. There are many different mathematical models, and arguments over which one is most appropriate are very common in research, as in this case.
2. Choice of statistical analysis
All of the authors were critical of aspects of the data analysis. Rozing and colleagues argued that the time period studied by Vijg is not long enough to draw conclusions.
Yet Vijg countered this by saying that the fact that there has been no increase in maximum age for 20 years despite the increase in the number in centenarians “speaks for itself.”
Bryan G. Hughes and Siegfried Hekimi – both from McGill University in Montreal, Canada – argued that the normal variability in these data can generate plateaus, increasing and decreasing, that even out eventually.
They said that a variety of conclusions could be reached, depending on the mathematical model used.
It is therefore not possible to “predict the trajectory that maximum lifespans will follow in the future,” they concluded. Vijg’s response was that their model is a better fit for the data.
3. Choice of splitting the dataset
Hughes and Hekimi questioned the way that the data were partitioned. Vijg and colleagues split their data into two sets, a choice based on visual inspection of the data. But is this scientifically robust?
Vijg cited a paper by F. J. Anscome – from the Department of Statistics at Yale University in New Haven, CT – from 1973. Anscome said that “a computer should make both calculations and graphs. Both sorts of output should be studied; each will contribute to understanding.”
To Vijg, this validates that “graphing data in order to evaluate the choice of model has long been acknowledged as a useful and important technique by statisticians.”
Is a paper from the 1970s enough to support this claim? Rozing and colleagues didn’t think so. They used a dataset from the world of sports to demonstrate how splitting data can affect results.
Using data from the Olympics allowed them to test whether or not there has been an overall increase in long-jump distances.
When they partioned their data into two groups based on last world record set in 1991, they saw “an improvement in performance up to 1991 and deterioration thereafter.”
But when the data were not split, they saw a “significant increase in the winning long-jump distances over time” and no decrease afterwards.
Vijg countered that the long-jump data show that there is a mechanical limit to how far a human can jump, drawing parallels to their interpretation of a biological limit to maximum human life.
No comment on the statistical analysis of the long-jump data was made.
It is worth noting that scientists are under immense pressure to publish results in widely read journals. This is not only a metric of success for the department in which they are working, but it also helps them to apply for research funding.
Nature is one of the most widely read scientific journals, making publication in Nature extremely desirable. The criteria for publication in this journal are strict: original scientific research that has not been published elsewhere, outstanding scientific importance, and content of interest to an interdisciplinary readership.
In 2013, only 856 of nearly 11,000 submitted manuscripts were published.
Peer reviewers assess manuscripts according to these criteria. They are chosen in part for their “ability to evaluate the technical aspects of the paper fully and fairly.”
While the statistical analysis and data of the original manuscript were perhaps not scrutinized in detail, the critiques published this week have gone into considerable depth.
The authors have provided responses to each point. This is very similar to the normal peer review process, during which reviewers provide comments on a manuscript and the authors are given the opportunity to defend their findings.
Interestingly, Nature‘s guidelines say that “although Nature‘s editors regard it as essential that any technical failings noted by referees are addressed, they are not so strictly bound by referees’ editorial opinions as to whether the work belongs in Nature.”
Did the editors at Nature choose this manuscript for its potential to generate interest in the scientific and public communities? It would not be the first time.
As the multiple arguments made in the follow-up articles highlight, there is a gray area between right and wrong when it comes to data analysis.
All parties feel that their approach is the most valuable, and all have provided references to back this up. This is, in fact, a good snapshot of how scientific research works.
There are always arguments and counter-arguments. Just because a paper is published, it doesn’t mean that it is true; subsequent data analysis may reveal a different interpretation.
A critical assessment of all studies published is therefore essential. This is easy for scientists with experience in the same area of research, but it is significantly harder for scientists from other fields and the general public.
Whether or not you believe Vijg and his colleagues’ interpretation of the data or feel that the arguments brought forward by their critics are more compelling, it is worth bearing in mind that any dataset can be interpreted in different ways.