Continuing this essay series on modern barriers to scientific progress, using Michael Bhaskar’s book Human Frontiers: The Future of Big Ideas in an Age of Small Thinking, this essay will explore various problems caused by the way in which scientific publications are put together today.
As Bhaskar writes:
Science is not only caught up in the dynamic of consuming ever more resources to maintain a steady and at times frustrating rate of progress, but could arguably be its source. The case goes something like this: the outputs of science, usually research papers, have been growing at an exponential rate for decades. Depending on the discipline, their numbers double every ten to twenty years. Growth in journals output is around 8–9 per cent a year, implying an ever faster rate of doubling – every nine years. In biomedical research the PubMed database sees over a million new papers a year; 12–13,000 papers are uploaded to arXiv, the main repository of physics research, every month, up from around 2000 twenty years ago. According to one estimate, Google Scholar incorporates nearly 400 million documents. Indeed, scientometric research indicates that such growth is part of a centuries-long pattern of exponentially increasing knowledge production. More science PhDs are awarded and more funding granted than at any previous point, and by a decisive margin. Graphs of all three – papers published, PhDs awarded, funding dollars – rise vertiginously from the postwar period to the present. In one large research university today, there may be as many researchers working in a field as in the entirety of Europe or America a hundred years ago. We should expect extraordinary results given this output, the scale of which dwarfs anything in history. But while the growth in numbers of papers may appear to be exponential, analysis of the literature indicates that new ideas only grow in linear fashion. For every new idea, then, the number of papers produced is now much, much higher. This is the same pattern we saw with new technologies …
As James Gleik writes in his biography of physicist Richard Feynman, “The currency of scientific information had not yet been devalued by excess” when Feynman did his greatest work.
Today, a flood of scientific papers is associated with an even greater flood of scientific article authors. As Bhaskar writes, “In 2013 just two people, Peter Higgs and François Englert, won the Nobel Prize for the discovery of the Higgs boson. Yet 3000 people were named as authors on the key papers that went into finding it.”
This exponential growth in scientific article authors has not corresponded with an exponential growth in scientific progress. As Bhaskar writes:
Expenditure, numbers of PhDs and publications have all grown between ten- and a hundredfold; yet scientific progress and significant discoveries, in the eyes of its most important contemporary exponents, haven't. Scientific big ideas are also in a long-term pattern of diminishing returns … No one doubts that science will continue growing, particularly in applied fields, even if the impact might wane. What is at stake is the ability of science to revolutionise itself with profound shifts of view. Will we have another discovery with the significance of heliocentrism, Darwinian natural selection, Mendel's genetics, quantum mechanics, Bohr and Pauling's unification of physics and chemistry, the structure of DNA, the Big Bang? The making of such revolutions will however get more difficult and require more input. They are by no means guaranteed. This is borne out by the conduct of science today. Before the Second World War the phrase ‘Big Science’ would have been meaningless. Today it is routine. A study in Science found that in 1955 around 50 per cent of engineering papers were team authored. By the 2010s, this was 90 per cent. Today a team-authored paper is six times more likely to receive over one thousand citations. Team size growth is constant, averaging 17 per cent per decade. Similar outcomes are to be found in most scientific disciplines. In the last decade the number of papers in the Nature Index with more than one thousand named authors increased from none to well over a hundred, while Web of Science records more than a thousand papers with over a thousand authors published in just the years 2014–2018 … Again and again, the pattern of the Higgs boson is repeated: what took an individual or small team to begin requires thousands to finish. The structure of DNA was discovered by essentially three people: Watson, Crick and Franklin. When it came to one of the undoubted breakthroughs of recent times, the sequencing of the human genome, it took thirteen years, multiple global teams working with and against each other and, conservatively, thousands if not tens of thousands of highly trained scientists. The pattern holds in nuclear energy: from Rutherford's discovery of atomic structure almost alone on a basic Cambridge workbench to the awesome scale of the Manhattan Project. Or think of Einstein predicting gravitational waves and then the decades-long multi-billion-dollar endeavour of confirming their reality … The direction of travel for large portions of research is towards ever more sizeable teams and experiments. This suggests that delivering big ideas in science is harder than ever. (And now there is a growing body of evidence to suggest that small teams are more innovative and capable of disrupting science.) Furthermore, that direction of travel is down a more specialised, narrower road … There are almost certainly more academics alive in the last twenty or thirty years than at any other time in history, working across the full span of intellectual endeavour on political theory, philosophy, anthropology, sociology. But that great mass of scholars burrow down into niches and publish in little-read journals. In the twenty-first century came a proliferation of niche masters’ degrees, but few major new branches of knowledge … Fitting the wider pattern of diminishing returns and rational pessimism, the number of elements discovered per chemistry paper published then declined right up to the present. Discoveries continue, but you need enormous equipment and extremely rare source materials to find these “superheavy” elements. They are elusive and unstable, decaying in moments. Indeed, one Russian lab has built a $60 million experiment just to try and find the elements numbered 119 and 120. Compare the journey of discovery for two elements a few centuries apart: element 117 was discovered by an international collaboration who got an unstable isotope of berkelium from the single accelerator in Tennessee capable of synthesizing it, shipped it to a nuclear reactor in Russia where it was attached to a titanium film, brought it to a particle accelerator in a different Russian city where it was bombarded with a custom-made exotic isotope of calcium, sent the resulting data to a global team of theorists, and eventually found a signature indicating that element 117 had existed for a few milliseconds. Meanwhile, the first modern element discovery, that of phosphorous in the 1670s, came from a guy looking at his own piss … The “river of discovery” flows on, but its character has changed: the river itself has never been larger or absorbed more, but the discoveries within it grow, in comparison, smaller.
Along with an explosion of scientific articles comes an explosion of scientific conferences. As Eric Gilliam writes:
In practice, they [scientists] more so blamed the human organization problems — essentially administrative issues — that they saw all around them. The growing conference sizes made it much more difficult to keep up with adjacent fields and scientific meetings. Seminars began to cater to narrower and narrower sub-branches of work rather than broad ones. These were the places that many researchers leveraged to actually keep up to date on new work and problems in their fields as well as others. But, as money began to funnel into their field in the post-War era, there were more and more researchers and logistical decisions had to be made on how to do things like run conferences and decide who sits in what seminars. The following Richard Feynman excerpt — taken from a 1973 oral history interview, which was one of a series of interviews between Charles Weiner and Feynman — goes into why, in the early 1970s, Feynman felt physics conferences had begun to grow far less useful than they were during the initial interviews for the series — where Feynman had told positive stories about the state of conferences as recently as 1956:
Weiner: How are they [conferences] going from what you’ve seen over the years? Is this the same kind of continuing tradition of the same kinds of people coming together? Because in the early period — we talked about the meeting where you got up and said: “Mr. Block has an idea that I would like to tell you about,” and you talked at the following meeting on that, and these were very exciting things. Has it continued in that same tradition?
Feynman: No, they’ve gotten too big. For example, they have parallel sessions which they never had before. Parallel sessions means that more than one thing is going on at a time, in fact, usually three, sometimes four. And when I went to the meeting in Chicago, I was only there two days before I broke my kneecap, but I had a great deal of trouble making up my mind which of the parallel sessions I was going to miss. Sometimes I’d miss them both, but sometimes there were two things I would be interested in at the same time. These guys who organize this imagine that each guy is a specialist and only interested in one lousy corner of the field. It’s impossible really to go — so it’s just as if you went to half the meeting. Therefore half is not much better than nothing. You might as well stay home and read the reports.
This is no small thing. Feynman, who never religiously kept up with the literature, was one of many who used these conferences as a primary way of keeping up with the changing state of things.
Nor are all these scientific articles and conferences particularly reliable. As Bhaskar writes:
Separately, in discussions of contemporary science, one phrase recurs with wearying frequency: reproducibility crisis. If the results of experiments (and social scientific papers) cannot be replicated, and this is happening at scale, then crisis is not an ill-chosen word. A Nature poll of 1500 scientists found that 70 per cent had failed to reproduce results. In some fields up to 50 per cent of studies fail to replicate. The figures compound corrosive evidence of methodological error like small sample sizes, inaccurate statistical analyses, bias and even fraud in much scientific work, a system in which the gaming of results for maximum impact trumps rigour. What's more, as in cancer research, the accumulation of science generally creates its own problem: there is simply too much relevant material for scientists to consume. Important, often older work is thus regularly missed.
In addition, scientific journals have a bias in favor of only publishing studies that found some sort of positive finding, and against studies that found no connection between the various phenomena studied. But, as physicist Richard Feynman recounts in Surely You’re Joking, Mr. Feynman?, it’s just as important to publish studies that found no connection between the phenomena studied:
I was a little surprised when I was talking to a friend who was going to go on the radio. He does work on cosmology and astronomy, and he wondered how he would explain what he applications of this work were. "Well," I said, "there aren't any." He said, "Yes, but then we won't get support for more research of this kind." I think that's kind of dishonest. If you're representing yourself as a scientist, then you should explain to the layman what you're doing and if they don't want to support you under those circumstances, then that's their decision. One example of the principle is this: If you've made up your mind to test a theory, or you want to explain some idea, you should always decide to publish it whichever way it comes out. If we only publish results of a certain kind, we can make the argument look good. We must publish both kinds of results. I say that's also important in giving certain types of government advice. Supposing a senator asked you for advice about whether drilling a hole should be done in his state; and you decide it would be better in some other state. If you don't publish such a result, it seems to me you're not giving scientific advice. You're being used. If your answer happens to come out in the direction the government or the politicians like, they can use it as an argument in their favor; if it comes out the other way, they don't publish it at all. That's not giving scientific advice.
Researchers have also found that, regarding economic journals, even the articles published showing “positive” correlations are largely based on the cherry-picking of data points and contributing factors to nudge results in the positive direction. As the researchers write:
We assess statistical power and excess statistical significance among 31 leading economics general interest and field journals using 22,281 parameter estimates from 368 distinct areas of economics research. Median statistical power in leading economics journals is very low (only 7%), and excess statistical significance is quite high (19%). Power this low and excess significance this high raise serious doubts about the credibility of economics research. We find that 26% of all reported results have undergone some process of selection for statistical significance and 56% of statistically significant results were selected to be statistically significant. Selection bias is greater at the top five journals, where 66% of statistically significant results were selected to be statistically significant. A large majority of empirical evidence reported in leading economics journals is potentially misleading. Results reported to be statistically significant are about as likely to be misleading as not (falsely positive) and statistically nonsignificant results are much more likely to be misleading (falsely negative).
And scientific journals, of course, often miss the significance of new developments. James Gleick’s book Chaos: Making a New Science, explores the development of “chaos theory,” a scientific approach to discerning patterns in what might otherwise appear to be chaotic systems. At first this new science was met with rejection by science journals. Discussing the research of Mitchell Feigenbaum, Gleick writes:
It still rankled that editors at the top-ranked academic journals had [originally] deemed his work unfit for publication for two years after he began submitting it. The notion of a scientific breakthrough so original and unexpected that it cannot be published seemed a slightly tarnished myth. Modern science, with its vast flow of information and its impartial system of peer review is not supposed to be a matter of taste. One editor who sent back a Feigenbaum manuscript recognized years later that he had rejected a paper that was a turning point for the field. Yet he still argued that the paper had been unsuited to his journal’s audience of applied mathematicians … Modern economics relies heavily on the efficient market theory. Knowledge is assumed to flow freely from place to place. The people making important decisions are supposed to have access to more or less the same body of information. Of course, pockets of ignorance or inside information remain here and there, but on the whole, once knowledge is public, economists assume that it is known everywhere. Historians of science often take for granted an efficient market theory of their own.
Nor are the huge numbers of scientific articles today particularly succinct. As researchers have found, they tend to be riddled with excess verbiage that doesn’t add to the substance of the concepts discussed:
Writing in a clear and simple language is critical for scientific communications. Previous studies argued that the use of adjectives and adverbs cluttered writing and made scientific text less readable. The present study aims to investigate if the articles in life sciences have become more cluttered and less readable across the past 50 years in terms of the use of adjectives and adverbs. The data that were used in the study were a large dataset of 775,456 scientific texts published between 1969 and 2019 in 123 scientific journals. Results showed that an increasing number of adjectives and adverbs were used and the readability of scientific texts have decreased in the examined years. More importantly, the use of emotion adjectives and adverbs also demonstrated an upward trend while that of nonemotion adjectives and adverbs did not increase.
A more fundamental problem may involve the process by which scientific articles are approved and published, the so-called “peer review” system. Adam Mastroianni has written an extensive critique of the “peer review” system that supposedly maintains the integrity of the scientific articles published. As Mastroianni writes:
For the last 60 years or so, science has been running an experiment on itself. The experimental design wasn’t great; there was no randomization and no control group. Nobody was in charge, exactly, and nobody was really taking consistent measurements. And yet it was the most massive experiment ever run, and it included every scientist on Earth. Most of those folks didn’t even realize they were in an experiment. Many of them, including me, weren’t born when the experiment started. If we had noticed what was going on, maybe we would have demanded a basic level of scientific rigor. Maybe nobody objected because the hypothesis seemed so obviously true: science will be better off if we have someone check every paper and reject the ones that don’t pass muster. They called it “peer review.” This was a massive change. From antiquity to modernity, scientists wrote letters and circulated monographs, and the main barriers stopping them from communicating their findings were the cost of paper, postage, or a printing press, or on rare occasions, the cost of a visit from the Catholic Church. Scientific journals appeared in the 1600s, but they operated more like magazines or newsletters … (Only one of Einstein’s papers was ever peer-reviewed, by the way, and he was so surprised and upset that he published his paper in a different journal instead.) That all changed after World War II. Governments poured funding into research, and they convened “peer reviewers” to ensure they weren’t wasting their money on foolish proposals. That funding turned into a deluge of papers, and journals that previously struggled to fill their pages now struggled to pick which articles to print. Reviewing papers before publication, which was “quite rare” until the 1960s, became much more common. Then it became universal. Now pretty much every journal uses outside experts to vet papers, and papers that don’t please reviewers get rejected. You can still write to your friends about your findings, but hiring committees and grant agencies act as if the only science that exists is the stuff published in peer-reviewed journals. This is the grand experiment we’ve been running for six decades. Peer review was a huge, expensive intervention. By one estimate, scientists collectively spend 15,000 years reviewing papers every year. It can take months or years for a paper to wind its way through the review system … And universities fork over millions for access to peer-reviewed journals … In all sorts of different fields, research productivity has been flat or declining for decades, and peer review doesn’t seem to have changed that trend. New ideas are failing to displace older ones. Many peer-reviewed findings don’t replicate, and most of them may be straight-up false … All we can say from these big trends is that we have no idea whether peer review helped, it might have hurt, it cost a ton, and the current state of the scientific literature is pretty abysmal … Here’s a simple question: does peer review actually do the thing it’s supposed to do? Does it catch bad research and prevent it from being published? It doesn’t. Scientists have run studies where they deliberately add errors to papers, send them out to reviewers, and simply count how many errors the reviewers catch. Reviewers are pretty awful at this. In this study reviewers caught 30% of the major flaws, in this study they caught 25%, and in this study they caught 29%. These were critical issues, like “the paper claims to be a randomized controlled trial but it isn’t” and “when you look at the graphs, it’s pretty clear there’s no effect” and “the authors draw conclusions that are totally unsupported by the data.” Reviewers mostly didn’t notice. In fact, we’ve got knock-down, real-world data that peer review doesn’t work: fraudulent papers get published all the time … [P]retty much every story about fraud begins with the paper passing review and being published … Why don’t reviewers catch basic errors and blatant fraud? One reason is that they almost never look at the data behind the papers they review, which is exactly where the errors and fraud are most likely to be. In fact, most journals don’t require you to make your data public at all. You’re supposed to provide them “on request,” but most people don’t … (When one editor started asking authors to add their raw data after they submitted a paper to his journal, half of them declined and retracted their submissions. This suggests, in the editor’s words, “a possibility that the raw data did not exist from the beginning.”) … [I]f scientists cared a lot about peer review, when their papers got reviewed and rejected, they would listen to the feedback, do more experiments, rewrite the paper, etc. Instead, they usually just submit the same paper to another journal … A few journals publish reviews; most don't.
Economist Tyler Cowen has these ideas for improving the peer-review system:
[H]ow to ensure good refereeing, which if done correctly raises the quality of academic papers. Under the current system, editors decide which papers get refereed, and they choose the identities of the referees. Those same referees are underpaid and underincentivized, and often do a poor or indifferent job. Many of the original papers on mRNA vaccines, for example, were rejected numerous times by academic journals, hardly a ringing endorsement of the status quo. More generally, since publication is currently a yes/no decision, the refereeing system creates incentives to avoid criticism and play it safe, rather than to strike out with bold new ideas and risk rejection. Under my alternative vision, research scientists would be told to publish one-third less and devote the extra time to volunteer refereeing of what they consider to be the most important online postings. That refereeing, which would not be anonymous, would be considered as a significant part of their research contribution for tenure and promotion. Professional associations, foundations and universities could set up prizes for the top referees, who might be able to get tenure just by being great at adding value to other people’s work.
When scientists find their work rejected by journals that fail to appreciate their significance, they sometimes publish their findings on other “open” platforms. But those platforms themselves may end up censoring scientific views that may be unpopular at the time. As Allysia Finley writes in the Wall Street Journal:
As scientists struggle to publish against-the-grain research, many are turning to preprint servers—online academic repositories—to debunk studies in mainstream journals. Yet even some of those sites, such as the Social Science Research Network, are blocking studies that don’t fit preapproved narratives. In January 2022, Johns Hopkins University economist Steve H. Hanke reported that Covid lockdowns had little effect on deaths. When he attempted to publish the findings on SSRN, the site turned him down. “Given the need to be cautious about posting medical content, SSRN is selective on the papers we post,” a rejection notice informed Mr. Hanke. That’s the same response the site gave University of California, San Francisco epidemiologist Vinay Prasad when rejecting his studies debunking widely cited Covid studies, such as one claiming Boston schools’ mask mandate reduced cases. SSRN is run by the company Elsevier, which also publishes prominent medical journals that uniformly promote Covid orthodoxy. Scientific journals and preprint servers aren’t selective about research quality. They’re selective about the conclusions. If experts want to know why so many Americans don’t trust “science,” they have their answer. Too many scientists no longer care about science.
In the next essay in this series, we’ll explore how bureaucracy can stifle scientific progress.
Paul, Across all the topics about which you write, the one constant is that the screwups/misdirections/bad things are virtually ALWAYS from the government (whether or not they were done "for the children" or some other stupid thing). Clearly true here as well. I am astonished that no one can get a clue...the self protection of the bureaucrats looms large. I fervently hope that Musk can do to the US government what Milei did in Argentina. Musk got rid of 90% of the Twitter staff and it got far better. That says that we need to get rid of 95% of the government because it is far less efficient than old Twitter. No one would miss them.