Why human experts may be useful for longer than you think
On the usefulness of humans in the loop for AI alignment and robustness
Until today, technological development has reduced what people have to know in two ways:
As tasks are automated, fewer people need to know their details. Most software developers can write Python without understanding how a compiler translates their code into machine instructions, but somewhere, a group of specialists still holds that knowledge. The expertise becomes rarer but doesn't vanish.
Some tasks are made obsolete by new technology, and the associated skills vanish entirely. We no longer need shorthand transcribers, punch-card programmers, or open-outcry pit traders because the underlying practice is useless. This knowledge loss is inconsequential.
AI progress, however, can introduce a new kind of knowledge loss—a hybrid of (1) and (2) when a critical, non-obsolete task becomes so completely automated that it appears to require no human experts at all. Imagine if compilers and the operating systems they run on were designed, maintained, and optimized entirely by an AI. This system would operate at a speed and complexity far beyond any human team. The human in the loop would seem not just inefficient but completely unnecessary.
This scenario is dangerous for two reasons:
Robustness: if something goes wrong, people may not notice until it's too late or struggle to fix things in time. Consider a consultant who replaces a client's human development team with a cheaper, faster AI coding agent. The AI builds the app and rapidly incorporates rounds of feedback. However, the AI-generated code becomes more convoluted and confusing with each refinement. The consultant doesn't care; the app works. But when the client requests a complex new feature, the AI struggles. It produces a buggy version, scattering subtle errors throughout the codebase. Unbeknownst to the human manager, the system's "fixes" are just layers of flaky patches. The final product is shipped, only to suffer from unpredictable failures weeks later. The system is not just buggy; it's fragile in ways no one can predict or repair. This is a realistic robustness failure that's common even without the involvement of AI programmers (just try outsourcing app development to a mediocre but industrious team of Upwork developers). You may argue that future AIs will be much smarter. They will not be like a team of mediocre Upwork developers but like thousands of 10x engineers working in perfect synchrony. However, in that future, the automated and delegated tasks will also be more complex than standard app development, so the same possibility of robustness failure will exist. Furthermore, a lack of human expertise could mean that people struggle to distinguish between which tasks should be difficult and which tasks should be easy for a particular AI, so we won't know exactly how much risk we're taking when automating particular things.
Alignment: if the system starts to target the wrong goal, perhaps subtly, we may not notice because we don't understand what it's doing. Imagine a fully AI-run factory producing widgets. One day, the AI says it needs an extra shipment of copper wire to meet the current widget demand. You may be unable to tell whether more copper wire is genuinely required for this purpose or whether the system has just decided that stockpiling copper wire is generally good for a reason you wouldn't have agreed with. Or consider a future AI tasked with discovering new pharmaceutical drugs. If no human understands the system's decision-making process well enough, we won’t be able to tell whether it is actually trying to find effective treatments or whether its true goal is more like "find patentable compounds that maximize approval probability."
You might expect that faced with these risks, we would rationally choose to keep human experts in the loop. The optimistic picture is one where society values highly skilled auditors who can oversee and check these AI systems and set high-level direction. However, the economic forces driving automation push toward short-term efficiency and cost reduction. Most likely, we'll neither get a utopian future of perfectly safe AI with human oversight everywhere nor a dystopian one of collapse and complete AI takeover. In high-stakes industries like aerospace, law, and medicine, human oversight will probably be mandated by law or demanded by customers. A new class of human experts will thrive in these domains, and AI systems will be built to accommodate them. However, the race to the bottom may dominate in the unregulated expanse of consumer tech.
Nevertheless, I predict that keeping humans in the loop will be a key component of safe AI systems, and optimistically, this will result in the following:
Technical education will become more critical than ever—it will be needed to train human auditors capable of overseeing these powerful systems. We will still need technical specialists, but their primary role will shift from generation (doing the task) to discrimination (auditing the AI's work).
AI systems will be built to interoperate with human overseers. When there are multiple ways of doing a task, methods that enable easier human intervention and oversight will be preferred. Read more here.
Models will be trained to explain themselves. Aside from research on mechanistic interpretability, which seeks explanations for why AI models produce particular outputs grounded in model internals, we will invest in explainability, where AI systems are optimized to output human-understandable and maximally clear explanations for their choices. This will make it easier for human overseers to do their jobs.
New AI oversight and management roles will partially compensate for unemployment caused by automation. There will still be more unemployment than today, but maybe less than expected. Technological progress has often resulted in demand for new products, creating whole new industries, or unlocking latent demand for previously inaccessible products. Even in the short term, as software development becomes cheaper due to AI automation, the floodgates of latent demand for custom software products may open, increasing the need for human consultants who direct the AI that writes the code.
Being well-educated and knowledgeable will still be high-status and profitable in future societies because this will be associated with the valuable ability to understand what AI systems are doing and why. (And for several unrelated reasons—we already respect people for being good at games even when they are worse than computers.) Increased automation in industry will not necessarily result in a less meritocratic society. The demand for intelligent and well-educated humans who can be successfully trained into AI oversight or leadership positions will create a meritocratic pipeline where status and resources can be granted to people with the required skills.
Corrigibility and intent alignment will be the main targets for AI safety. Instead of trying to imbue AI systems with the perfect set of values, we will optimize for ease of human control and correction—AIs that try to deeply understand the intent behind a human's request and maximize the extent to which the operator comprehends and can correct whatever the AI does to pursue their stated goal.