Continuing this essay series on modern barriers to scientific progress, using Michael Bhaskar’s book Human Frontiers: The Future of Big Ideas in an Age of Small Thinking, this essay will explore how the modern phenomenon of over-specialization might be posing an additional barrier to scientific progress.
As Bhaskar writes, the towering pillars of knowledge humankind has accumulated is a wonderful thing, and it has greatly improved our well-being. But it’s sheer volume can become debilitating:
The body of knowledge and methodology that constitutes science is a good candidate for humanity's greatest achievement. Humanity works with scales from the Planck length to the observable cosmos. Our minds can cycle back billions of years to the churning inflationary birth of the universe. We can unlock the chemical foundations and internal structures of inordinately complex organisms. We have a powerful, proven description of nature. From gravitational wave astronomy to the detailed observation of exoplanets, from the biochemistry of new diseases to molecular engineering, it's not hard to find examples of stunning successes and discoveries. Look at any given month of recent years, and you find significant advances across the board, extraordinary feats of engineering, probes to the farthest reaches of existence or life. Historical claims that science has hit a wall have always disintegrated in the face of a new round of discovery … [But] [a]s knowledge advances, there is always more to know. Greater specialisation and longer training times result. Benjamin F. Jones calls this the “burden of knowledge” effect, whereby the sheer quantity and balkanisation of knowledge becomes an obstacle to original thought. Its ramifications are huge and underappreciated, and we should expect it to intensify in future. To gain a sense of this burden, think of libraries: in the seventeenth century John Harvard was given the naming rights to his eponymous university by donating some 320 books. Today the Library of Congress alone holds 38 million volumes. Harvard itself has 20 million. That exponential increase in the number of books is the burden of knowledge … Breakthroughs happen at the frontier of knowledge. But the distance to that frontier is always getting further away. All of us start learning at square one, with nothing. We must travel to the frontier, and that process – of studying, mastering, memorising and practising – gets longer as knowledge accumulates. Jones has quantified this trend, modelling the ‘burden of knowledge’ via a corpus of 55,000 patent inventors drawn from the United States Patent and Trademark Office from 1975 to 1999. The numbers of people behind a patent and their age at the grant of first patent are both rising. That is, it takes longer to create a patent, and even then, it needs more hands. There was an average of 1.73 inventors per patent in 1975, but by 1999 it was 2.33, a 35 per cent increase over the period, and the researchers behind this figure argue it is an extremely conservative estimate.
Along with this huge body of knowledge comes a deluge of scientific papers, and the people writing them. As Bhaskar writes:
Papers published in the Proceedings of the National Academy of Sciences have more than double the number of authors since the 1990s, and none of this is to mention what we've already seen, the rise of mega-authored papers … The more knowledge already exists, the larger the team required to generate new knowledge or invention. And the larger the “citation tree”– the number of earlier patents cited in the creation of a new patent – the larger the team … When I talk about things being “harder” or more “difficult”, this illustrates what I mean; that mechanisms like the burden of knowledge imply that it takes more people to have an idea now than in the past; and having an idea in the future will need more people than today. Larger teams bring with them greater coordination costs and interpersonal and institutional frictions and constraints. An analysis of more than 65 million papers, patents and software products published in Nature showed that while large teams were good at developing existing ideas, “smaller teams have tended to disrupt science and technology with new ideas and opportunities”.
In the late nineteenth century, it was still possible to be a generalist scientist, as was the great scientist James Clerk Maxwell. I was reading a biography of his in which the author writes, “Like many Victorian scientists, Maxwell did not suffer from the modern tendency to remain constrained by a tight focus. He clearly appreciated the chance to roam free across the topics covered by physics. This is apparent in some of his letters, where he happily discussed a wide range of physical subjects with fellow scientists.” Still, specialization is an old phenomenon:
Specialisation has also been increasing for centuries. More recently, measured by inventors filing patents in different fields (controlled for various influencing factors), the data show a 6 per cent increase in inventor specialisation every decade. Just one example: in the fifty years between 1960 and 2010, the number of unique pairs of terms in articles categorised by the US National Library of Medicine grew from 100 to 100,000. As specialisation grows, knowledge production turns towards minutiae, at times towards the trivial. In other words, the very structure and trajectory of knowledge, the great expansion of the frontier itself, adds to the challenge of coming up with a big new idea in particular … A corollary is that as researchers become mired in tiny niches they are less likely to switch fields. They remain stuck in narrow disciplines and worldviews; fresh perspectives and the chance of cross-pollination get lost … [O]ur social and academic structures are complicit. As it recedes, the frontier gets narrowed down to nested sub-sub-sub-specialities, funneling people into ever smaller portions of enquiry … This extreme narrowing effectively precludes broad thinking at the frontier, without question one of the most powerful dampeners on the future of specifically big ideas. It also dovetails with the increased length of researcher training periods. [Benjamin] Jones shows how the age at which scientists or inventors make their first invention has remorselessly risen. People are older than ever when they do their most significant work; it takes them longer to see through their most meaningful ideas … The age at which doctorates are awarded has consistently risen since the 1960s and there has been a marked increase in the number of postdoctoral positions people occupy before leading their own research; between them they often take well over ten years. Jones finds that in the early twentieth century future Nobel winners started independent research at twenty-three. The average age was thirty-one by its end. The average age of recipients of the prize, and when they did their winning research, has risen remorselessly across physiology, chemistry and physics. Today you apparently need to know much more to make a discovery. An ageing effect hence shortens the most significant idea-producing years of researchers’ lives. A 30 per cent decline in innovation potential can be explained by this effect. Nor does this imply that people make up the difference with greater productivity when they hit middle age. In fact all that extra work doesn't increase people's potential in the long run. Einstein may have been twenty-six when they started to produce their best work, but it is unlikely a 26-year-old physicist would do the same today. Young people simply have less opportunity to work at the frontier.
Over-specialization precludes people from gaining the knowledge across many different fields that is necessary to make connections between fields. As Bhaskar writes:
And if there are big ideas to be found between fields, then the weight of specialisation will press doubly hard. The more narrowly research is forced to tunnel, the harder it is to connect those tunnels.
Even back in the 1930’s, physicist Richard Feynman describes how many smart people seemed to have lost their ability to apply knowledge in different contexts to make connections. In the collection of Feynman’s stories in Surely You’re Joking, Mr. Feynman, he recounts:
I often liked to play tricks on people when I was at MIT. One time, in mechanical drawing class, some joker picked up a French curve (a piece of plastic for drawing smooth curves--a curly, funny-looking thing) and said, “I wonder if the curves on this thing have some special formula?” I thought for a moment and said, “Sure they do. The curves are very special curves. Lemme show ya,” and I picked up my French curve and began to turn it slowly. “The French curve is made so that at the lowest point on each curve, no matter how you turn it, the tangent is horizontal.” All the guys in the class were holding their French curve up at different angles, holding their pencil up to it at the lowest point and laying it along, and discovering that, sure enough, the tangent is horizontal. They were all excited by this “discovery”--even though they had already gone through a certain amount of calculus and had already “learned” that the derivative (tangent) of the minimum (lowest point) of any curve is zero (horizontal). They didn't put two and two together. They didn't even know what they “knew.” I don't know what's the matter with people: they don't learn by understanding; they learn by some other way--by rote, or something. Their knowledge is so fragile!
Back to Bhaskar:
[S]o powerful are the constraints of complexity that the researcher Alex Mesoudi believes it may put an upper limit on our cultural evolution. Essentially we could arrive at a point where knowledge and culture are so complex, we are completely bogged down in trying to process and understand what is already there. Accumulations don't always work as a springboard; sometimes their weight holds us back. Compounding the problem, evidence suggests that interdisciplinary work, needed to grapple with the complex questions at the frontier, has consistently lower success in being funded.
In the end, Bhaskar writes:
We have many more philosophers than the ancient Greeks and many more playwrights than Elizabethan England, but it's not clear we have many more people whose impact at the frontiers is commensurate with Plato or Shakespeare … Overproduction breeds problems. A paper on Alzheimer's research was published in Science; four years later, failing attempts at replication, it was retracted, but not before earning 500 citations. In the words of Dan Sarewitz, “poor quality research metastasizes through the published scientific literature, and distinguishing knowledge that is reliable from knowledge that is unreliable or false or simply meaningless becomes impossible.” This is what a saturated system looks like: incapable of self-correction, clogged, burdensome. No wonder there is a reproducibility crisis.
As Bhaskar reminds us, there was a large element of chance involved in early breakthrough discoveries, chance that was allowed to happen because scientists tended to tinker around with lots of different things at once:
Big ideas are fragile, imbricated with forces far beyond the control of any individual or even any society. Two of those forces are particularly indicative of how ideas function. The first is luck. In the annals of invention, discovery and creation, the role played by serendipity is dizzying. Robert Koch created bacterial cultures after accidentally leaving a potato out to go mouldy, while a few years later Alexander Fleming stumbled on penicillin by accidentally leaving such a culture in his laboratory sink during a spell of freak weather. Radiation and X-rays were both uncovered during the search for other things. Columbus found the “New World” by mistake. The pacemaker was meant to record the human heartbeat, not control it. Happy accidents are behind inventions from Newcomen's steam engine to the spinning jenny to vulcanised rubber. Just as every idea is formed of other ideas, so it also involves an element of chance – a random meeting of minds, a lucky experiment, a date missed, an accidental find, a serendipitous connection. Misreadings, faulty copies and fluky mistakes are legion, as powerful, if not more so, than directed efforts or “heroic genius”.
This recalled to my mind another great passage from Surely You’re Joking Mr. Feynman? in which Feynman describes how, at a certain point in his physics career, he came to feel stifled by the over-specialized university environment, and so he looked for ways to find joy and serendipity in physics again:
Then I had another thought: Physics disgusts me a little bit now, but I used to enjoy doing physics. Why did I enjoy it? I used to play with it. I used to do whatever I felt like doing. It didn't have to do with whether it was important for the development of nuclear physics, but whether it was interesting and amusing for me to play with. When I was in high school, I'd see water running out of a faucet growing narrower, and wonder if I could figure out what determines that curve. I found it was rather easy to do. I didn't have to do it; it wasn't important for the future of science; somebody else had already done it. That didn't make any difference: I'd invent things and play with things for my own entertainment. So I got this new attitude. Now that I am burned out and I'll never accomplish anything, I've got this nice position at the university teaching classes which I rather enjoy, and just like I read the Arabian Nights for pleasure, I'm going to play with physics, whenever I want to, without worrying about any importance whatsoever. Within a week I was in the cafeteria and some guy, fooling around, throws a plate in the air. As the plate went up in the air I saw it wobble, and I noticed the red medallion of Cornell on the plate going around. It was pretty obvious to me that the medallion went around faster than the wobbling. I had nothing to do, so I start to figure out the motion of the rotating plate. I discover that when the angle is very slight, the medallion rotates twice as fast as the wobble rate two to one. It came out of a complicated equation! Then I thought, "Is there some way I can see in a more fundamental way, by looking at the forces or the dynamics, why it's two to one?" I don't remember how I did it, but I ultimately worked out what the motion of the mass particles is, and how all the accelerations balance to make it come out two to one. I still remember going to Hans Bethe and saying, "Hey, Hans! I noticed something interesting. Here the plate goes around so, and the reason it's two to one is …" and I showed him the accelerations. He says, "Feynman, that's pretty interesting, but what's the importance of it? Why are you doing it?" "Hah!" I say. "There's no importance whatsoever. I'm just doing it for the fun of it." His reaction didn't discourage me; I had made up my mind I was going to enjoy physics and do whatever I liked. I went on to work out equations of wobbles. Then I thought about how electron orbits start to move in relativity. Then there's the Dirac Equation in electrodynamics. And then quantum electrodynamics. And before I knew it (it was a very short time) I was "playing" working, really with the same old problem that I loved so much, that I had stopped working on when I went to Los Alamos: my thesis-type problems; all those old-fashioned, wonderful things. It was effortless. It was easy to play with these things. It was like uncorking a bottle: Everything flowed out effortlessly. I almost tried to resist it! There was no importance to what I was doing, but ultimately there was. The diagrams and the whole business that I got the Nobel Prize for came from that piddling around with the wobbling plate. [Note: Feynman won the Nobel Prize for Physics in 1965 for his contributions “to creating a new quantum electrodynamics by introducing Feynman diagrams: graphic representations of various interactions between different particles. These diagrams facilitate the calculation of interaction probabilities.”]
As Bhaskar writes:
As we have seen throughout history, and contrary to the popular myth that everything flows down from ivory towers, tinkerers and tools often open the spaces for new insights. The invention of the steam engine preceded the understanding of the laws of thermodynamics upon which it relies. Vaccination preceded the knowledge of antibodies.
David Epstein has written a book on the evidence for the advantages of generalist over specialists. It’s called Range: Why Generalists Triumph in the Specialized World.
As Epstein writes, specialization may work fine in areas where only the same type of narrow pattern occurs over and over again, but it’s less likely to work anywhere patterns are more difficult to predict:
In those domains, which involved human behavior and where patterns did not clearly repeat, repetition did not cause learning. Chess, golf, and firefighting are exceptions, not the rule … In 2009, [Daniel] Kahneman and [Gary] Klein took the unusual step of coauthoring a paper in which they laid out their views and sought common ground. And they found it. Whether or not experience inevitably led to expertise, they agreed, depended entirely on the domain in question. Narrow experience made for better chess and poker players and firefighters, but not for better predictors of financial or political trends, or of how employees or patients would perform. The domains Klein studied, in which instinctive pattern recognition worked powerfully, are what psychologist Robin Hogarth termed “kind” learning environments. Patterns repeat over and over, and feedback is extremely accurate and usually very rapid. In golf or chess, a ball or piece is moved according to rules and within defined boundaries, a consequence is quickly apparent, and similar challenges occur repeatedly. In wicked domains, the rules of the game are often unclear or incomplete, there may or may not be repetitive patterns and they may not be obvious, and feedback is often delayed, inaccurate, or both. Expert firefighters, when faced with a new situation, like a fire in a skyscraper, can find themselves suddenly deprived of the intuition formed in years of house fires, and prone to poor decisions.
Epstein describes how range has been shown to help in chess as well:
[When] the first “freestyle chess” tournament was held … [t]eams could be made up of multiple humans and computers. The lifetime-of-specialized-practice advantage that had been diluted in advanced chess was obliterated in freestyle. A duo of amateur players with three normal computers not only destroyed Hydra, the best chess supercomputer, they also crushed teams of grandmasters using computers. Kasparov concluded that the humans on the winning team were the best at “coaching” multiple computers on what to examine, and then synthesizing that information for an overall strategy. Human/ Computer combo teams— known as “centaurs”— were playing the highest level of chess ever seen. If Deep Blue’s victory over Kasparov signaled the transfer of chess power from humans to computers, the victory of centaurs over Hydra symbolized something more interesting still: humans empowered to do what they do best without the prerequisite of years of specialized pattern recognition … Even in video games that are less bound by tactical patterns, computers have faced a greater challenge. The latest video game challenge for artificial intelligence is StarCraft, a franchise of real- time strategy games in which fictional species go to war for supremacy in some distant reach of the Milky Way. It requires much more complex decision making than chess. There are battles to manage, infrastructure to plan, spying to do, geography to explore, and resources to collect, all of which inform one another. Computers struggled to win at StarCraft, Julian Togelius, an NYU professor who studies gaming AI, told me in 2017. Even when they did beat humans in individual games, human players adjusted with “long-term adaptive strategy” and started winning. “There are so many layers of thinking,” he said. “We humans sort of suck at all of them individually, but we have some kind of very approximate idea about each of them and can combine them and be somewhat adaptive. That seems to be what the trick is.” … In 2019, in a limited version of StarCraft, AI beat a pro for the first time. (The pro adapted and earned a win after a string of losses.) But the game’s strategic complexity provides a lesson: the bigger the picture, the more unique the potential human contribution. Our greatest strength is the exact opposite of narrow specialization. It is the ability to integrate broadly. According to Gary Marcus, a psychology and neural science professor who sold his machine learning company to Uber, “In narrow enough worlds, humans may not have much to contribute much longer. In more open- ended games, I think they certainly will. Not just games, in open ended real- world problems we’re still crushing the machines.”
Epstein gives another example:
In 2009, a report in the esteemed journal Nature announced that Google Flu Trends could use search query patterns to predict the winter spread of flu more rapidly than and just as accurately as the Centers for Disease Control and Prevention. But Google Flu Trends soon got shakier, and in the winter of 2013 it predicted more than double the prevalence of flu that actually occurred in the United States. Today, Google Flu Trends is no longer publishing estimates, and just has a holding page saying that “it is still early days” for this kind of forecasting. Tellingly, Marcus gave me this analogy for the current limits of expert machines: “AI systems are like savants.” They need stable structures and narrow worlds … When we know the rules and answers, and they don’t change over time— chess, golf, playing classical music— an argument can be made for savant-like hyperspecialized practice from day one. But those are poor models of most things humans want to learn.
As an example of a generalist capable of making momentous connections, Epstein cites Claude Shannon:
[E]lectrical engineer Claude Shannon … launched the Information Age thanks to a philosophy course he took to fulfill a requirement at the University of Michigan. In it, he was exposed to the work of self- taught nineteenth- century English logician George Boole, who assigned a value of 1 to true statements and 0 to false statements and showed that logic problems could be solved like math equations. It resulted in absolutely nothing of practical importance until seventy years after Boole passed away, when Shannon did a summer internship at AT& T’s Bell Labs research facility. There he recognized that he could combine telephone call-routing technology with Boole’s logic system to encode and transmit any type of information electronically. It was the fundamental insight on which computers rely. “It just happened that no one else was familiar with both those fields at the same time,” Shannon said.
Epstein explains the work of Philip Tetlock regarding what it takes to make accurate predictions, indicating the edge that generalists have over specialists:
“If you’re working on well-defined and well-understood problems, specialists work very, very well,” [Ouderkirk] told me. “As ambiguity and uncertainty increases, which is the norm with systems problems, breadth becomes increasingly important.” … [Major research on that issue] started at the 1984 meeting of the National Research Council’s committee on American-Soviet relations. Newly tenured psychologist and political scientist Philip Tetlock was thirty years old, by far the most junior committee member. He listened intently as members discussed Soviet intentions and American policies. Renowned experts confidently delivered authoritative predictions, and Tetlock was struck by the fact that they were often perfectly contradictory to one another, and impervious to counterarguments. Tetlock decided to put expert predictions to the test. With the Cold War in full swing, he began a study to collect short- and long-term forecasts from 284 highly educated experts (most had doctorates) who averaged more than twelve years of experience in their specialties. The questions covered international politics and economics, and in order to make sure the predictions were concrete, the experts had to give specific probabilities of future events. Tetlock had to collect enough predictions over enough time that he could separate lucky and unlucky streaks from true skill. The project lasted twenty years, and comprised 82,361 probability estimates about the future. The results limned a very wicked world. The average expert was a horrific forecaster. Their areas of specialty, years of experience, academic degrees, and even (for some) access to classified information made no difference. They were bad at short-term forecasting, bad at long-term forecasting, and bad at forecasting in every domain. When experts declared that some future event was impossible or nearly impossible, it nonetheless occurred 15 percent of the time. When they declared a sure thing, it failed to transpire more than one-quarter of the time. The Danish proverb that warns “It is difficult to make predictions, especially about the future,” was right. Dilettantes who were pitted against the experts were no more clairvoyant, but at least they were less likely to call future events either impossible or sure things, leaving them with fewer laugh-out-loud errors to atone for—if, that was, the experts had believed in atonement. Many experts never admitted systematic flaws in their judgment, even in the face of their results. When they succeeded, it was completely on their own merits—their expertise clearly enabled them to figure out the world. When they missed wildly, it was always a near miss; they had certainly understood the situation, they insisted, and if just one little thing had gone differently, they would have nailed it. Experts remained undefeated while losing constantly. “There is often a curiously inverse relationship,” Tetlock concluded, “between how well forecasters thought they were doing and how well they did.” There was also a “perverse inverse relationship” between fame and accuracy. The more likely an expert was to have his or her predictions featured on op-ed pages and television, the more likely they were always wrong. Or, not always wrong. Rather, as Tetlock and his coauthor succinctly put it in their book Superforecasting, “roughly as accurate as a dart-throwing chimpanzee.” There was, however, one subgroup within the experts that managed to see more of what was coming … [T]hey were not vested in a single approach. They were able to take from each argument and integrate apparently contradictory worldviews. They agreed that Gorbachev was a real reformer, and that the Soviet Union had lost legitimacy outside of Russia. Some of those integrators actually foresaw that the end of the Soviet Union was close at hand, and that real reforms would be the catalyst. The integrators outperformed their colleagues on pretty much everything, but they especially trounced them on long-term predictions. Eventually, Tetlock conferred nicknames (borrowed from philosopher Isaiah Berlin) that became famous throughout the psychology and intelligence-gathering communities: the narrow-view hedgehogs, who “know one big thing,” and the integrator foxes, who “know many little things.” Incredibly, the hedgehogs performed especially poorly on long-term predictions within their domain of expertise. They actually got worse as they accumulated credentials and experience in their field. The more information they had to work with, the more they could fit any story to their worldview. This did give hedgehogs one conspicuous advantage. Viewing every world event through their preferred keyhole made it easy to fashion compelling stories about anything that occurred, and to tell the stories with adamant authority. In other words, they make great TV. In 2005, he published the results of his long study of expert judgment, and they caught the attention of the Intelligence Advanced Research Projects Activity (IARPA), a government organization that supports research on the U.S. intelligence community’s most difficult challenges. In 2011, IARPA launched a four-year prediction tournament in which five researcher-led teams competed. Each team could recruit, train, and experiment however it saw fit. Every day for four years, predictions were due at 9 a.m. Eastern time. The questions were hard. What is the chance that a member will withdraw from the European Union by a target date? Will the Nikkei close above 9,500? What is the likelihood of a naval clash claiming more than ten lives in the East China Sea? Forecasters could update predictions as often as they wanted, but the scoring system rewarded accuracy over time, so a great prediction at the last minute before a question’s end date was of limited value. The team run by Tetlock and Mellers was called the Good Judgment Project. Rather than recruit decorated experts, in the first year of the tournament they made an open call for volunteers. After a simple screening, they invited thirty-two hundred to start forecasting. From those, they identified a small group of the foxiest forecasters—just bright people with wide-ranging interests and reading habits but no particular relevant background—and weighted team forecasts toward them. They destroyed the competition. In year two, the Good Judgment Project randomly arranged the top “superforecasters” into online teams of twelve, so that they could share information and ideas. They beat the other university-run teams so badly that IARPA dropped those lesser competitors from the tournament. The volunteers drawn from the general public beat experienced intelligence analysts with access to classified data “by margins that remain classified,” according to Tetlock. (He has, though, referenced a Washington Post report indicating that the Good Judgment Project performed about 30 percent better than a collection of intelligence community analysts.) Hedgehog experts have more than enough knowledge about the minutiae of an issue in their specialty to do just what Dan Kahan suggested: cherry-pick details that fit their all-encompassing theories. Their deep knowledge works against them. Skillful forecasters depart from the problem at hand to consider completely unrelated events with structural commonalities rather than relying on intuition based on personal experience or a single area of expertise. In Tetlock’s twenty-year study, both foxes and hedgehogs were quick to update their beliefs after successful predictions, by reinforcing them even more strongly. When an outcome took them by surprise, however, foxes were much more likely to adjust their ideas. Hedgehogs barely budged. Some hedgehogs made authoritative predictions that turned out wildly wrong, and then updated their theories in the wrong direction. They became even more convinced of the original beliefs that led them astray. “Good judges are good belief updaters,” according to Tetlock. If they make a bet and lose, they embrace the logic of a loss just as they would the reinforcement of a win. That is called, in a word: learning.
In the next essay in this series, we’ll continue this exploration of the problems caused, or opportunities lost, by over-specialization.