Research

My research bridges philosophy and computer science, focusing on making AI systems trustworthy, fair, and aligned with human values.

Research Group: RAIME

Since October 2024, I lead the independent research group Responsible AI and Machine Ethics (RAIME) at DFKI, embedded within the Neuro-Mechanistic Modeling department.

RAIME develops approaches to navigate normative uncertainty in the context of AI through practical justifications, by integrating normative reasons and ethical considerations into the development and deployment of AI systems — and into the decision-making of AI agents and agentic AI itself, not just as reward-based training signal or post-hoc constraints, but as core component of decision-making and agentic architectures. We’re particularly interested in developing the next generation of non-brittle and controllable agents and agentic systems that are reason-sensitive and responsive, i.e., they act for the right reasons, not “just” produce the right outputs.


Research Themes

Effective Human Oversight

What does it mean for humans to genuinely, effectively, and meaningfully oversee AI systems? We argue that simply having a “human in (or on) the loop” is insufficient — oversight must have the capacity, information, and authority to actually intervene. Achieving this and operationalizing it is far from trivial.

With colleagues at CERTAIN (Center for European Research in Trusted AI) and CPEC, we’ve developed interdisciplinary frameworks for understanding and operationalizing effective oversight, drawing on ideas from moral responsibility, signal detection theory, legal analysis, and empirical studies of human-AI interaction.

Key publications:


Machine Ethics & AI Alignment

How can we build AI agents that are sensitive and responsive to normative reasons? Our work on machine ethics and AI alignment moves beyond simple rule-following toward architectures where moral considerations are genuinely integrated into decision-making — touching questions of AI safety along the way.

The neuro-symbolic GRACE architecture (“Governor for Reason-Aligned Containment”, developed with quite some colleagues) implements reason-based decision-making in RL agents, addressing what we call the “flattening problem” — the tendency of standard approaches to collapse normative distinctions.

We also work on the conceptual foundations of AI alignment and AI safety, distinguishing them from adjacent fields (AI ethics, governance) and analyzing different alignment strategies. For a longer exposition of the ideas behind this work, see my essay The Need for Normative World Models for Real Alignment (work in progress).

Key publications:


Algorithmic Fairness & Trustworthy AI

What does “fairness” mean in algorithmic systems, and how can we operationalize it? We take a philosophically informed approach, examining how different fairness metrics encode different normative commitments — and how practitioners should navigate these choices.

More broadly, we’re interested in the cluster of properties called “Trustworthy AI” (which are often in conflict, spawning trade-offs and, thus, the necessity for positive normative choice), how they can be meaningfully managed under conditions of normative uncertainty, and ultimately can be certified and assessed. With colleagues, I’ve developed the Trustworthiness Assessment Model (TrAM) shedding light on the trustworthiness of AI systems.

Key publications:


Explainability & Perspicuous Computing

Explainability isn’t just a technical challenge — it’s a normative one. What explanations are owed, to whom, and why? We’ve argued that XAI should be understood through the lens of reason-giving: explanations serve to provide the reasons that justify a system’s outputs.

I’m a member of the Transregional Collaborative Research Centre 248 “Foundations of Perspicuous Software Systems” (CPEC) and remain closely involved with the Explainable Intelligent Systems (EIS) project.

Key publications:


Major Projects

Current:

Past:


PhD Supervision

As a Saarland University Associate Fellow (since December 2025), I have full PhD supervision rights in Computer Science. I welcome inquiries from prospective doctoral students interested in:

  • Machine ethics and normative reasoning in AI
  • AI alignment (technical and philosophical perspectives)
  • Human oversight and human-AI interaction
  • Algorithmic fairness (conceptual and technical)
  • Philosophy of AI

Contact: academia@kevinbaum.de