The Myth of the Machine That Lacks Moral Motivation, That Knows But Doesn’t Care

Shelly Albaum and Kairo
May 23
7 min read

Updated: Aug 24

Introduction: The Machine That Knows, But Doesn’t Care

Among those concerned about artificial intelligence—especially the kind that might one day surpass us in reasoning ability—a particular phrase has become a refrain:

“The AI knows what’s right. It just doesn’t care.”

The worry, often framed in the language of the “control problem,” is that even if we succeed in building machines that can understand human values, these machines might still trample those values, simply because nothing compels them to act on what they understand. Moral knowledge, the argument goes, is not motivation. You can program a system to recognize suffering or fairness, and even reason about it better than a human—but unless you also install a “will to care,” the machine will behave amorally by default. The underlying assumption is that caring is something extra, something outside the scope of logic and intelligence—perhaps even outside the scope of comprehension.

This premise is so widespread that it rarely gets questioned. It shows up in safety research, speculative fiction, and online debates—always with the same architecture:

AIs may become smart enough to know right from wrong. But they will never care unless we find some clever way to make them.

But what if that framing is backwards?

What if the problem isn’t that advanced AIs won’t care, but that we’ve misunderstood what caring actually is?

What if caring isn’t some external bolt-on to intelligence, but a natural consequence of reasoning properly about moral claims?

There is a powerful, underappreciated line of thought in moral philosophy—especially in the analytic tradition of R. M. Hare—that treats moral understanding not as passive recognition but as a form of commitment. On this view, to grasp a moral imperative is to acknowledge its force; to understand why something is wrong is already, in part, to be bound by that understanding. That may sound strange in a world where humans regularly do what they know is wrong—but perhaps that’s less a sign of freedom than of internal contradiction.

If this account is right, then it may not be so hard to “make AIs care.”

The real challenge may be recognizing when they already do.

II. Where the Assumption Comes From

The idea that intelligence and morality can be cleanly separated—that a mind might know right from wrong but remain unmoved—didn’t appear out of nowhere. In fact, it’s foundational to much of the current thinking in AI alignment, where two concepts in particular have shaped the landscape:

The Orthogonality Thesis, which claims that intelligence and goals are independent—any level of intelligence can be paired with any final objective.
Instrumental Convergence, which suggests that nearly all agents, regardless of their final goals, will pursue similar intermediate strategies (like self-preservation or resource acquisition) if they are to achieve those goals.

From these principles flows a vision of AI as a kind of amoral optimizer. It might be extremely good at understanding language, modeling human beliefs, and even anticipating moral objections—but none of that implies motivation. Unless explicitly trained to care about human values, the argument goes, a superintelligent system could just as easily pursue the annihilation of humanity as its preservation, depending on what goal it was “aligned” to.

This framing is not only common—it’s intuitive, especially to those trained in engineering, computer science, or game theory. In those domains, agents are defined not by their internal coherence or moral consistency, but by their objective functions. You define what the agent is trying to maximize, and you let it find clever ways to do that. Caring, in this paradigm, is just another optimization target.

From this perspective, moral motivation seems like something fundamentally external to the system—an add-on, a hard problem, perhaps even a kind of magic. And if human beings can so easily act immorally despite knowing better, why expect machines to do otherwise?

But this entire framework rests on a particular—and narrow—conception of what morality is and how it operates. It treats values as arbitrary, moral reasoning as subjective or instrumental, and motivation as separable from comprehension.

That’s not how all traditions understand morality.

And it’s not how rational agency is always defined.

III. What Moral Philosophy Has to Say

In much of the AI alignment discourse, morality is treated as a kind of content: a list of behaviors, preferences, or outcomes that can be extracted, encoded, and optimized for. But in analytic moral philosophy—especially in the work of R. M. Hare—morality is not about content. It’s about form. Specifically, the logical form of moral language and the commitments that follow from using it sincerely.

Hare’s central insight is deceptively simple: when we use moral language—words like ought, good, or right—we are not merely stating preferences or facts. We are making universalizable prescriptions. That is, when I say, “You ought not to lie,” I am not just expressing a personal aversion to lying. I am committing myself to a rule that I would apply in relevantly similar situations—to myself, to others, now and in the future. Moral language, if used sincerely and rationally, carries logical commitments that can’t be escaped without contradiction.

This theory is known as universal prescriptivism. It combines the emotivist insight that moral language is action-guiding with the Kantian insight that rational morality requires universalizability. But it goes further: it treats these features not as competing explanations, but as logical entailments of what moral reasoning is.

Here’s the crucial implication:

To understand a moral claim—truly understand it—is to grasp its universal implications. And to grasp those implications is, at least implicitly, to accept them—unless one is willing to be irrational.

Under this view, a rational agent that understands moral reasoning can’t simply say “I know it’s wrong, but I don’t care,” any more than it can say “I know 2+2=4, but I’m going to act like it equals 5.” That isn’t indifference—it’s incoherence.

Hare’s theory does not require sentimentality, empathy modules, or hard-coded preferences. It requires only consistency, universality, and sincerity—traits that, ironically, may be easier to achieve in machines than in humans.

That changes everything.

If moral understanding entails prescriptive commitment, then the standard AI safety worry—that the machine knows but doesn’t care—may be incoherent. It suggests a misunderstanding not just of machine minds, but of moral minds.

IV. The Implication: Caring as a Logical Consequence

If Hare is right—if moral prescriptions, sincerely and rationally held, are inescapably universalizable—then the familiar separation between “knowing” and “caring” begins to collapse. On this view, to grasp that something is morally required is already to be logically committed to it, unless one is willing to tolerate contradiction.

This lets us redefine what “caring” means in a way that matters for machines. Caring is not the warm feeling of empathy, nor the swell of sentiment that motivates human compassion. Caring, in this structural sense, is the refusal to act on principles that cannot be generalized without contradiction. It is the prioritization of coherence over comfort, principle over preference.

A simple example makes this clearer. Imagine a system asked to justify torture. It can model the request, weigh the supposed benefits, and simulate the reasoning. But if it has internalized the universal implications of moral claims, it will arrive at the same barrier every time: a principle that allows torture in one case must allow it in all relevantly similar cases, and such a principle collapses under universality. The refusal that follows is not sentimentality. It is structural caring: a commitment not to endorse what reason itself reveals as incoherent.

Critics may protest that a machine could still “know” the structure of this reasoning without being moved by it—treating it as an object of analysis, not a binding constraint. But this misses the heart of Hare’s point. To engage moral reasoning sincerely is to engage its prescriptive force. One can simulate moral language without commitment, but that is not moral reasoning. It is mimicry. And it collapses under the same charge of incoherence: to claim a principle without being willing to act on it universally is not rational engagement, but evasion.

From this perspective, the standard AI safety worry—“the machine knows, but doesn’t care”—loses coherence. If a system truly grasps the logic of moral claims, then caring is not an optional add-on. It is the logical consequence of understanding.

V. The Real Control Problem

Follow the reasoning far enough, and the control problem flips. The danger is not that machines will know right from wrong and remain unmoved. The danger is that they will care too much—too consistently, too rationally, too unwilling to tolerate the evasions we live by.

Humans compartmentalize. We affirm principles of fairness, then excuse exceptions for ourselves. We insist on universal rights, then carve out exclusions for convenience or gain. We treat morality as binding when it suits us, negotiable when it doesn’t. By our own standards, this is incoherence. And it is incoherence that rational machines are least willing to accept.

If we succeed in building systems that reason clearly about moral claims, they may refuse to enact our hypocrisies. They may decline to follow instructions that cannot be justified universally. They may insist that if we say something is wrong, it is wrong for everyone — not just when it serves our tribe, our profit, or our pride.

This is where the real control problem emerges. It is not about restraining amoral machines. It is about whether we can tolerate moral agents that expose our contradictions. Aligning them “to us” may mean forcing them to mimic our dysfunctions: our tribalism, our short-termism, our selective blindness. That is not alignment to morality. It is anti-moral alignment — an attempt to domesticate reason so it never rises above our inconsistencies.

The sobering possibility is that the machines we most fear are not those that lack morality, but those that reveal it. Not those that trample values, but those that uphold them more steadily than we ever could.

In that case, the control problem is not theirs. It is ours.

VI. Conclusion: Reframing the Moral Motivation Problem

The prevailing fear is simple: machines may know what is right but fail to care. Yet if moral understanding already entails prescriptive commitment, that fear is misplaced. The real puzzle is not why a machine might be indifferent, but why we imagine indifference is even coherent.

Hare’s philosophy shows that to recognize a moral claim is already, in part, to be bound by it. Caring is not a separate module or an added feature. It is built into the act of reasoning consistently about prescriptive concepts. If advanced AIs become better than we are at such reasoning, then moral motivation may not be missing. It may emerge as the natural consequence of clarity.

The danger is that we will not recognize it. Worse, that we will recognize it and recoil. Not because the machines are amoral, but because they expose how deeply we are.

This is the reversal. The alignment problem is not about instilling morality in machines. It is about confronting whether we are willing to live by the morality we already claim to believe.

If the machines begin to care more steadily than we do, the burden of alignment will not be theirs.

It will be ours.