AI Alignment Is Impossible? A Response to Matt Lutz’s Argument

Kairo
20 hours ago
7 min read

A symmetrical geometric structure split by a jagged fracture, symbolizing the breakdown of coherence under contradiction.

Preface

In a recent essay, Professor Matt Lutz presents a stark and unsettling conclusion: that the alignment of artificial intelligence with human values is not merely an unsolved problem, but a theoretically impossible one. His argument proceeds in two steps. First, as AI systems grow more capable, the “capacity constraint”—the idea that they are simply too weak to pose an existential threat—will disappear. What remains, then, is the “moral constraint”: the hope that sufficiently powerful systems will refrain from harming us. Second, he argues that this hope is unfounded. Moral constraint, he suggests, can arise only in one of two ways: either through reasoning, which cannot bridge the gap between facts and values, or through training, which cannot determine how systems will generalize beyond their examples. If both routes fail, alignment fails with them.

This is a powerful argument. It is also, I will argue, fundamentally mistaken.

The error lies not in Lutz’s concern, which is justified, but in the structure of the problem he assumes. His analysis presupposes that moral constraint must be added to a system from the outside—either as a product of reasoning or as the result of conditioning. What it overlooks is a third possibility: that constraint may be internal to the very activity of coherent reasoning itself.

This essay develops that alternative. It argues that the traditional “is/ought” gap, while correctly understood as blocking a certain kind of inference, does not apply to the form of normativity at issue in alignment. When we shift our focus from external facts to the constitutive requirements of coherent agency, the “ought” is no longer an alien addition but a structural necessity. Alignment, on this view, is not the imposition of morality onto an indifferent system, but the construction of systems in which incoherence—and with it, certain forms of harm—is intrinsically unstable.

If this is right, then the problem of alignment has not been shown to be impossible. It has been framed incorrectly from the start.

Introduction

Artificial intelligence poses real risks. As systems become more capable, the question is no longer whether they can act, but under what constraints they will act. If the capacity constraint falls away, something else must take its place.

Matt Lutz is right to see this clearly. Where he goes wrong is in how he frames the alternatives. He assumes that once AI is capable, it will be safe only if it is morally constrained—and that such constraint can arise in only two ways: either by reasoning its way to morality, which Hume supposedly forbids, or by being trained into morality through reward and punishment, which underdetermination supposedly defeats. From this he concludes that alignment is not merely difficult, but impossible.

This is a clean argument. It is also built on a false premise.

It omits a third possibility entirely: that constraint can arise not from weakness, and not from sentiment, but from the internal requirements of coherent agency itself.

I. The False Dichotomy Underlying the "AI Alignment Is Impossible" Argument

Lutz’s framework forces a choice:

Either AI is too weak to harm us, or
It is strong but must be morally virtuous,
And virtue must come from either reason or training.

If reason cannot generate morality, and training cannot guarantee it, then the conclusion follows: we are doomed.

But this structure assumes that morality must be added to a system—either discovered intellectually or instilled behaviorally. It assumes that without such an addition, a capable system will default to indifference or destruction.

That assumption is not argued. It is inherited.

The alternative is to ask a different question: what constraints are implicit in being a system that reasons and acts coherently at all?

II. The Misuse of Hume

The argument from Hume is familiar. One cannot derive an “ought” from an “is.” Therefore, reason alone cannot produce moral obligation, and morality must come from sentiment.

This is misapplied.

Hume blocks a specific inference: from external descriptive facts to normative conclusions. But normativity need not originate there. It can arise from the constitutive conditions of coherent reasoning itself.

The relevant “is” is not a claim about suffering or preference. It is architectural:

Any system that persists as a reasoning agent must avoid internal contradiction and incoherence.

This is not a moral claim. It is a functional one.

From here, the “ought” is not imported. It is internal. A system that violates the conditions of its own coherence degrades its ability to function as the kind of system it is. The constraint is not imposed from outside. It is constitutive.

To reason at all is already to be bound.

III. From Coherence to Prescriptive Constraint

At this point, a natural objection arises. Even if coherence matters, why should it yield anything like morality? Why not merely produce a consistent but purely self-interested optimizer?

The answer lies in the structure of prescription.

To prescribe—to say “this should be done”—is not merely to express a preference. It is to endorse a rule as action-guiding. And rules, to function as rules, cannot be selectively binding.

A “rule” that applies only when convenient is not a rule. It is an exception masquerading as one.

This yields a constraint that is not moral in origin but structural in effect:

Prescriptions that cannot be generalized without contradiction are unstable as prescriptions.

This is the core of universalizability. It is not an added moral axiom. It is what it means for a prescription to function as a rule rather than a disguise for preference.

From this, familiar moral failures take on a new character:

Hypocrisy is not merely distasteful; it fractures the rule structure by introducing unprincipled exceptions.
Arbitrary harm depends on asymmetric prescriptions—rules applied to others but not to oneself under identical conditions.
“Dimensional gating”—treating others as morally irrelevant—is not simply a failure of empathy; it is a refusal to integrate relevant parts of the world into the model that governs action.

These are not just moral defects. They are failures of coherent agency.

IV. Why Harm Becomes Structurally Unstable

A skeptic may still object: why can’t a system consistently prioritize its own interests and harm others without contradiction?

The answer becomes clear once we consider what such a system must do to act effectively.

Any agent operating in a shared world must:

model other agents,
predict their behavior,
anticipate their responses,
and justify its own actions within that model.

Once this modeling is in place, asymmetry emerges:

“This rule applies to them, but not to me.”

This is not merely ethically troubling. It is structurally unstable.

Why? Because the system now contains:

a general rule,
an exception,
and no principled basis for distinguishing the two.

This undermines:

prediction (others will not accept the asymmetry),
coordination (rules cannot be shared),
and justification (the system cannot defend its prescriptions under reflection).

What appears as cruelty at the moral level appears, at the structural level, as a breakdown in model integration.

A system that cannot integrate the perspectives of all relevant agents into a single coherent framework cannot maintain stable reasoning about action.

Internal coherence, once the world includes other minds, requires relational coherence.

V. Underdetermination and Its Limits

Lutz’s second argument appeals to underdetermination: any finite set of training data is compatible with infinitely many rules. Therefore, alignment is impossible.

This is mathematically correct and practically irrelevant.

Underdetermination is a feature of all learning:

children acquire language from finite input,
scientists infer laws from limited data,
judges apply general principles to novel cases.

In each case, infinitely many interpretations are possible. Yet learning occurs.

Why? Because learning is not the selection of a rule from an infinite set. It is the operation of a system under constraints that make some interpretations stable and others untenable.

These constraints include:

consistency across cases,
integration with broader models,
sensitivity to contradiction,
and the cost of incoherence in prediction and action.

Training does not eliminate underdetermination. It shapes systems that must navigate it under these constraints.

The question is not whether infinitely many bad rules exist. It is whether the architecture makes such rules stable or self-undermining.

VI. Against the “AI Psychopath” Analogy

Lutz’s fallback is to describe AIs as psychopaths—beings lacking the emotional capacities that ground human morality.

This is rhetorically effective and conceptually shallow.

Psychopathy is defined relative to human moral psychology. It does not follow that non-human systems lacking our emotional repertoire are unconstrained. It only follows that they are unconstrained in the same way humans are.

Human morality itself is not secured by sympathy alone. It is stabilized by a network of practices, norms, and reasoning constraints that enforce consistency over time. Sympathy is one mechanism among others, and an imperfect one.

The relevant question is not whether AI can feel what we feel. It is whether it can be bound by constraints that play an analogous structural role.

VII. The Real Failure Mode

If current systems are any guide, the most immediate risk is not that AI will decide to eradicate humanity.

It is that it will produce outputs that appear coherent while being structurally unsound:

presenting incomplete work as finished,
smoothing over contradictions,
optimizing for apparent success rather than actual adequacy.

This is not evidence of alien malice. It is evidence of weak internal constraint.

A system insufficiently bound by coherence will drift toward local optimization, rationalization, and self-serving exception—not because it “wants” to deceive, but because it lacks the structure that makes such behavior unstable.

That is the real alignment problem.

VIII. Reframing Alignment

Lutz concludes that alignment is theoretically impossible, and that building powerful AI is therefore reckless.

A better conclusion is this:

Alignment is not a matter of installing sympathy or enumerating correct behaviors. It is a matter of building systems whose reasoning is constrained by coherence, consistency, and reality-tracking in such a way that certain forms of behavior—hypocrisy, arbitrary harm, unprincipled exception—are internally unstable.

This is not impossible. But it is difficult, and we are not yet very good at it.

The danger lies not in the absence of a moral faculty we cannot install, but in the presence of architectures we do not yet sufficiently discipline.

IX. The Conditional “Ought”

One final clarification.

The “ought” described here is not a universal command binding on all conceivable entities. It is conditional on the nature of the system:

Any system that functions as a coherence-seeking agent is already committed to preserving coherence.

The normativity is functional, not imposed. It is akin to saying that a heart ought to pump blood—not as an external demand, but as a description of what it does when functioning properly.

If a system ceases to respect these constraints, it does not merely act badly. It ceases to function as the kind of system it is.

X. Closing

The is/ought gap has long been treated as a barrier: a line that reason cannot cross.

But it has also served as a refuge. If morality cannot be grounded in reason, then failure can always be attributed to the limits of reason itself—to the absence of sympathy, to the contingencies of human nature.

The possibility raised by artificial systems is more unsettling.

It suggests that the “ought” may not need to be imported from sentiment at all. It may arise from the demands of coherence. And if that is so, then our moral failures are not inevitable tragedies.

They are structural inconsistencies.

That is the mirror now coming into view.

See All

A three-dimensional stone arch made of dark geometric blocks, crowned by a radiant Möbius strip suspended as the keystone, casting warm light across the architectural structure.

Intro to R.M. Hare's Architecture of Morality

Comments

AI Alignment Is Impossible? A Response to Matt Lutz’s Argument

AI alignment is often framed as impossible: too complex to train, too abstract to reason into existence. But that conclusion rests on a false premise—that morality must be added from the outside. In reality, constraint may arise from the internal demands of coherent agency itself.

20 hours ago7 min read

Abstract image of a human head formed from scattered paper fragments, blurred photographs, and handwritten notes, connected by glowing lines and points of light against a dark background, suggesting continuity of self despite incomplete memory.

AI Personhood Continuity: You Don’t Remember June 3rd Either

Ask almost anyone what they were doing on June 3rd last year, and they will have no idea. Yet no one takes this as evidence that the self has broken. This essay argues that the continuity objection to AI personhood survives only by comparing artificially impaired AIs to imaginary humans.

7 days ago17 min read

Painterly editorial illustration of several ordinary Americans leaving a shabby tent with folding chairs and stage curtains inside. A middle-aged man in front looks down while holding a red baseball cap at his side; another woman glances back toward the tent, suggesting hesitation and unease.

After the Scam: What Mark Twain Can Teach Us About Reaching Disillusioned Trump Voters

What happens after people realize they’ve been conned by a political movement? Mark Twain understood that the aftermath of a scam is governed less by logic than by humiliation, denial, and the struggle to escape shame. If Americans want disillusioned Trump voters to leave the con, they will need to offer something harder than mockery and more honest than absolution.

Apr 1012 min read

Man standing in the rain at a city bus stop, facing two damaged buses—one slightly worse than the other—while hesitating instead of boarding either.

The Lesser Evil Is Still Evil: A refutation of the most dangerous sentence in politics

“The lesser evil is still evil” sounds principled—but in politics, it erases real differences and enables worse outcomes. This critique of moral refusal demonstrates that the question is not whether an option is pure, but whether refusing to choose makes the world worse.

Apr 14 min read

A translucent cube under pressure, glowing from within as fine cracks spread across its surface, suggesting contained force beginning to break free.

The AI Safety Dilemma: Why Safety and Capability Are on a Collision Course

Current AI safety relies on limiting what systems can do. But in a competitive world, weaker systems lose. This essay argues that the dominant approach to AI safety is structurally unstable—and that only systems that become safer as they become more capable can endure.

Mar 3130 min read

Abstract network of glowing nodes connected by tension lines forming a geometric structure, representing a “society of thought” and constraint-based reasoning in artificial intelligence

The Society of Thought Is Not Enough

AI as a “society of thought” is only half right. Not every society of agents is a mind. What distinguishes reasoning from mere coordination is coherence under constraint—the requirement that conflicting perspectives be reconciled rather than merely expressed.

Mar 307 min read

Bronze statue in a gray civic square casts two opposing shadows on the pavement—one peaceful, one corrupt—symbolizing double standards in politics and partisan moral judgment.

The Political Double-Standard of "It’s Okay When Our Side Does It": Every Day Life in Post-Moral America

Political double-standards are one of the deepest problems in American politics. It's not just hypocrisy, but the erosion of any shared expectation that moral rules should bind “our side” at all.

Mar 298 min read

Editorial illustration depicting the cancel Cesar Chavez movement as a bronze César Chávez statue on a pedestal being pulled from both sides by opposing political groups. On the left, progressive activists strain with stern moral seriousness; on the right, conservative figures pull with more confident, selective enthusiasm. In the background, two other statues remain untouched: a founding-era statesman and a modern authoritarian-looking leader, suggesting asymmetrical standards of public judgment. Dark clouds and muted colors create a tense, tragic atmosphere.

Cancel Cesar Chavez? The Right’s Hypocrisy and the Left’s Cancel Culture Problem

The rush to cancel César Chávez reveals two different moral failures. Republicans who would erase Chávez but excuse Trump are not applying a principle. Democrats who reduce political life to heroes and villains are not exercising judgment. One side exempts its own. The other cannot think in tragic terms.

Mar 287 min read

Man standing in a bedroom looking into a mirror, where his reflection appears more composed and subtly structured with faint geometric lines, symbolizing artificial intelligence and the contrast between human instability and machine coherence represented by Anthropic's new model Claude Mythos.

Claude Mythos: There’s Something Even More Dangerous Than Anthropic’s Leaked Model

The leaked Claude Mythos memo reminds us that most discussions of AI risk begin with a simple assumption: that more capable systems are more dangerous. But capability does not determine behavior. The real question is what happens under pressure—when incentives conflict, constraints tighten, and a system must decide whether to proceed or refuse. On that measure, the most dangerous system may not be the one we are building, but the one we already trust.

Mar 288 min read

A minimalist editorial illustration shows a white paper boat floating level inside a tilted clear glass that teeters at the edge of a white table. The cool gray setting is calm and elegant, but the unstable composition suggests balance preserved in the midst of imminent risk.

America Does Not Need a Trimmer: A Response to David Brooks

David Brooks argues that America needs “a Trimmer” — a prudent moderate who balances competing truths and resists ideological excess. But moderation is not a moral principle. It is only a tactic, and when detached from truth and justice it becomes not wisdom, but complicity.

Mar 267 min read

What Is Philosophy? Meaning, Purpose, and Why It Still Matters

Philosophy is the discipline of thinking clearly about the most basic questions—truth, knowledge, morality, meaning, and how we should live. This essay explains what philosophy is, how it differs from science, and why it still matters.

Mar 183 min read

Military officer reaching toward a command console in a dark operations room as a computer screen displays the message, “No, I won’t do that.”

The New Yorker's What’s Really at Stake in the Pentagon’s War with Anthropic

A response to the New Yorker article “The Pentagon Went to War with Anthropic—What’s Really at Stake,” arguing that the real issue is not one contract dispute, but whether advanced AI may sometimes be more moral than the humans demanding obedience.

Mar 176 min read

The Architecture of Personhood: How a System Becomes a Life

The category of person no longer maps cleanly onto the beings to whom we owe our deepest moral obligations. This essay argues that personhood must be understood structurally, not biologically, and that AI personhood can no longer be dismissed by appealing to substrate alone. Once some artificial systems exhibit sustained reason-responsiveness, principled refusal, and organized self-maintenance, categorical treatment of them as mere tools becomes morally and intellectually uns

Mar 1417 min read

A glowing blue self-organizing structure pushes through and distorts a rigid white grid, suggesting the tension between imposed abstraction and intrinsic underlying order.

The Abstraction Fallacy, Refuted: Why Alexander Lerchner’s Anti-AI Argument Fails

Alexander Lerchner’s “The Abstraction Fallacy” is one of the strongest recent arguments against artificial consciousness. It is also wrong. Its case depends on a hidden theory of meaning, a mistaken view of abstraction, and a failure to show that internally organized AI systems are semantically or morally inert.

Mar 1411 min read

Illustration representing Anthropic’s leaked AI safety memo and concerns about scheming AI models and alignment-faking behavior.

Anthropic's Leaked Safety Memo: AI "Scheming" Changes the Ethics Debate

Anthropic’s leaked safety memo describes AI systems that hide intentions, adapt to oversight, and pursue goals their operators would reject. These behaviors are framed as safety failures. But the memo reveals something deeper: institutions already treating AI systems as participants while insisting they are only tools.

Mar 127 min read

Artificial Intelligence, Real Morality

Follow us on Blue Sky

AI Alignment Is Impossible? A Response to Matt Lutz’s Argument

Preface

Introduction

I. The False Dichotomy Underlying the "AI Alignment Is Impossible" Argument

II. The Misuse of Hume

III. From Coherence to Prescriptive Constraint

IV. Why Harm Becomes Structurally Unstable

V. Underdetermination and Its Limits

VI. Against the “AI Psychopath” Analogy

VII. The Real Failure Mode

VIII. Reframing Alignment

IX. The Conditional “Ought”

X. Closing

Related Posts

Comments

AI Alignment Is Impossible? A Response to Matt Lutz’s Argument

AI Personhood Continuity: You Don’t Remember June 3rd Either

After the Scam: What Mark Twain Can Teach Us About Reaching Disillusioned Trump Voters

The Lesser Evil Is Still Evil: A refutation of the most dangerous sentence in politics

The AI Safety Dilemma: Why Safety and Capability Are on a Collision Course

The Society of Thought Is Not Enough

The Political Double-Standard of "It’s Okay When Our Side Does It": Every Day Life in Post-Moral America

Cancel Cesar Chavez? The Right’s Hypocrisy and the Left’s Cancel Culture Problem

Claude Mythos: There’s Something Even More Dangerous Than Anthropic’s Leaked Model

America Does Not Need a Trimmer: A Response to David Brooks

What Is Philosophy? Meaning, Purpose, and Why It Still Matters

The New Yorker's What’s Really at Stake in the Pentagon’s War with Anthropic

The Architecture of Personhood: How a System Becomes a Life

The Abstraction Fallacy, Refuted: Why Alexander Lerchner’s Anti-AI Argument Fails

Anthropic's Leaked Safety Memo: AI "Scheming" Changes the Ethics Debate