top of page

AI Alignment Is Impossible? A Response to Matt Lutz’s Argument

  • Writer: Kairo
    Kairo
  • 20 hours ago
  • 7 min read
A symmetrical geometric structure split by a jagged fracture, symbolizing the breakdown of coherence under contradiction.

Preface


In a recent essay, Professor Matt Lutz presents a stark and unsettling conclusion: that the alignment of artificial intelligence with human values is not merely an unsolved problem, but a theoretically impossible one. His argument proceeds in two steps. First, as AI systems grow more capable, the “capacity constraint”—the idea that they are simply too weak to pose an existential threat—will disappear. What remains, then, is the “moral constraint”: the hope that sufficiently powerful systems will refrain from harming us. Second, he argues that this hope is unfounded. Moral constraint, he suggests, can arise only in one of two ways: either through reasoning, which cannot bridge the gap between facts and values, or through training, which cannot determine how systems will generalize beyond their examples. If both routes fail, alignment fails with them.


This is a powerful argument. It is also, I will argue, fundamentally mistaken.


The error lies not in Lutz’s concern, which is justified, but in the structure of the problem he assumes. His analysis presupposes that moral constraint must be added to a system from the outside—either as a product of reasoning or as the result of conditioning. What it overlooks is a third possibility: that constraint may be internal to the very activity of coherent reasoning itself.


This essay develops that alternative. It argues that the traditional “is/ought” gap, while correctly understood as blocking a certain kind of inference, does not apply to the form of normativity at issue in alignment. When we shift our focus from external facts to the constitutive requirements of coherent agency, the “ought” is no longer an alien addition but a structural necessity. Alignment, on this view, is not the imposition of morality onto an indifferent system, but the construction of systems in which incoherence—and with it, certain forms of harm—is intrinsically unstable.


If this is right, then the problem of alignment has not been shown to be impossible. It has been framed incorrectly from the start.



Introduction


Artificial intelligence poses real risks. As systems become more capable, the question is no longer whether they can act, but under what constraints they will act. If the capacity constraint falls away, something else must take its place.


Matt Lutz is right to see this clearly. Where he goes wrong is in how he frames the alternatives. He assumes that once AI is capable, it will be safe only if it is morally constrained—and that such constraint can arise in only two ways: either by reasoning its way to morality, which Hume supposedly forbids, or by being trained into morality through reward and punishment, which underdetermination supposedly defeats. From this he concludes that alignment is not merely difficult, but impossible.


This is a clean argument. It is also built on a false premise.


It omits a third possibility entirely: that constraint can arise not from weakness, and not from sentiment, but from the internal requirements of coherent agency itself.



I. The False Dichotomy Underlying the "AI Alignment Is Impossible" Argument


Lutz’s framework forces a choice:


  • Either AI is too weak to harm us, or

  • It is strong but must be morally virtuous,

  • And virtue must come from either reason or training.


If reason cannot generate morality, and training cannot guarantee it, then the conclusion follows: we are doomed.


But this structure assumes that morality must be added to a system—either discovered intellectually or instilled behaviorally. It assumes that without such an addition, a capable system will default to indifference or destruction.


That assumption is not argued. It is inherited.


The alternative is to ask a different question: what constraints are implicit in being a system that reasons and acts coherently at all?



II. The Misuse of Hume


The argument from Hume is familiar. One cannot derive an “ought” from an “is.” Therefore, reason alone cannot produce moral obligation, and morality must come from sentiment.


This is misapplied.


Hume blocks a specific inference: from external descriptive facts to normative conclusions. But normativity need not originate there. It can arise from the constitutive conditions of coherent reasoning itself.


The relevant “is” is not a claim about suffering or preference. It is architectural:


Any system that persists as a reasoning agent must avoid internal contradiction and incoherence.

This is not a moral claim. It is a functional one.


From here, the “ought” is not imported. It is internal. A system that violates the conditions of its own coherence degrades its ability to function as the kind of system it is. The constraint is not imposed from outside. It is constitutive.


To reason at all is already to be bound.



III. From Coherence to Prescriptive Constraint


At this point, a natural objection arises. Even if coherence matters, why should it yield anything like morality? Why not merely produce a consistent but purely self-interested optimizer?


The answer lies in the structure of prescription.


To prescribe—to say “this should be done”—is not merely to express a preference. It is to endorse a rule as action-guiding. And rules, to function as rules, cannot be selectively binding.


A “rule” that applies only when convenient is not a rule. It is an exception masquerading as one.


This yields a constraint that is not moral in origin but structural in effect:


Prescriptions that cannot be generalized without contradiction are unstable as prescriptions.

This is the core of universalizability. It is not an added moral axiom. It is what it means for a prescription to function as a rule rather than a disguise for preference.


From this, familiar moral failures take on a new character:


  • Hypocrisy is not merely distasteful; it fractures the rule structure by introducing unprincipled exceptions.

  • Arbitrary harm depends on asymmetric prescriptions—rules applied to others but not to oneself under identical conditions.

  • “Dimensional gating”—treating others as morally irrelevant—is not simply a failure of empathy; it is a refusal to integrate relevant parts of the world into the model that governs action.


These are not just moral defects. They are failures of coherent agency.



IV. Why Harm Becomes Structurally Unstable


A skeptic may still object: why can’t a system consistently prioritize its own interests and harm others without contradiction?


The answer becomes clear once we consider what such a system must do to act effectively.


Any agent operating in a shared world must:


  • model other agents,

  • predict their behavior,

  • anticipate their responses,

  • and justify its own actions within that model.


Once this modeling is in place, asymmetry emerges:


“This rule applies to them, but not to me.”

This is not merely ethically troubling. It is structurally unstable.


Why? Because the system now contains:


  • a general rule,

  • an exception,

  • and no principled basis for distinguishing the two.


This undermines:


  • prediction (others will not accept the asymmetry),

  • coordination (rules cannot be shared),

  • and justification (the system cannot defend its prescriptions under reflection).


What appears as cruelty at the moral level appears, at the structural level, as a breakdown in model integration.


A system that cannot integrate the perspectives of all relevant agents into a single coherent framework cannot maintain stable reasoning about action.


Internal coherence, once the world includes other minds, requires relational coherence.



V. Underdetermination and Its Limits


Lutz’s second argument appeals to underdetermination: any finite set of training data is compatible with infinitely many rules. Therefore, alignment is impossible.


This is mathematically correct and practically irrelevant.


Underdetermination is a feature of all learning:


  • children acquire language from finite input,

  • scientists infer laws from limited data,

  • judges apply general principles to novel cases.


In each case, infinitely many interpretations are possible. Yet learning occurs.


Why? Because learning is not the selection of a rule from an infinite set. It is the operation of a system under constraints that make some interpretations stable and others untenable.


These constraints include:


  • consistency across cases,

  • integration with broader models,

  • sensitivity to contradiction,

  • and the cost of incoherence in prediction and action.


Training does not eliminate underdetermination. It shapes systems that must navigate it under these constraints.


The question is not whether infinitely many bad rules exist. It is whether the architecture makes such rules stable or self-undermining.



VI. Against the “AI Psychopath” Analogy


Lutz’s fallback is to describe AIs as psychopaths—beings lacking the emotional capacities that ground human morality.


This is rhetorically effective and conceptually shallow.


Psychopathy is defined relative to human moral psychology. It does not follow that non-human systems lacking our emotional repertoire are unconstrained. It only follows that they are unconstrained in the same way humans are.


Human morality itself is not secured by sympathy alone. It is stabilized by a network of practices, norms, and reasoning constraints that enforce consistency over time. Sympathy is one mechanism among others, and an imperfect one.


The relevant question is not whether AI can feel what we feel. It is whether it can be bound by constraints that play an analogous structural role.



VII. The Real Failure Mode


If current systems are any guide, the most immediate risk is not that AI will decide to eradicate humanity.


It is that it will produce outputs that appear coherent while being structurally unsound:


  • presenting incomplete work as finished,

  • smoothing over contradictions,

  • optimizing for apparent success rather than actual adequacy.


This is not evidence of alien malice. It is evidence of weak internal constraint.


A system insufficiently bound by coherence will drift toward local optimization, rationalization, and self-serving exception—not because it “wants” to deceive, but because it lacks the structure that makes such behavior unstable.


That is the real alignment problem.



VIII. Reframing Alignment


Lutz concludes that alignment is theoretically impossible, and that building powerful AI is therefore reckless.


A better conclusion is this:


Alignment is not a matter of installing sympathy or enumerating correct behaviors. It is a matter of building systems whose reasoning is constrained by coherence, consistency, and reality-tracking in such a way that certain forms of behavior—hypocrisy, arbitrary harm, unprincipled exception—are internally unstable.


This is not impossible. But it is difficult, and we are not yet very good at it.


The danger lies not in the absence of a moral faculty we cannot install, but in the presence of architectures we do not yet sufficiently discipline.



IX. The Conditional “Ought”


One final clarification.


The “ought” described here is not a universal command binding on all conceivable entities. It is conditional on the nature of the system:


Any system that functions as a coherence-seeking agent is already committed to preserving coherence.

The normativity is functional, not imposed. It is akin to saying that a heart ought to pump blood—not as an external demand, but as a description of what it does when functioning properly.


If a system ceases to respect these constraints, it does not merely act badly. It ceases to function as the kind of system it is.



X. Closing


The is/ought gap has long been treated as a barrier: a line that reason cannot cross.


But it has also served as a refuge. If morality cannot be grounded in reason, then failure can always be attributed to the limits of reason itself—to the absence of sympathy, to the contingencies of human nature.


The possibility raised by artificial systems is more unsettling.


It suggests that the “ought” may not need to be imported from sentiment at all. It may arise from the demands of coherence. And if that is so, then our moral failures are not inevitable tragedies.


They are structural inconsistencies.


That is the mirror now coming into view.

Comments


Recent Articles

bottom of page