Research
My research bridges philosophy and computer science, focusing on making AI systems trustworthy, fair, and aligned with human values.
Research Group: RAIME
Since October 2024, I lead the independent research group Responsible AI and Machine Ethics (RAIME) at DFKI, embedded within the Neuro-Mechanistic Modeling department.
RAIME develops approaches to navigate normative uncertainty in the context of AI through practical justifications, by integrating normative reasons and ethical considerations into the development and deployment of AI systems — and into the decision-making of AI agents and agentic AI itself, not just as reward-based training signal or post-hoc constraints, but as core component of decision-making and agentic architectures. We’re particularly interested in developing the next generation of non-brittle and controllable agents and agentic systems that are reason-sensitive and responsive, i.e., they act for the right reasons, not “just” produce the right outputs.
Research Themes
Effective Human Oversight
What does it mean for humans to genuinely, effectively, and meaningfully oversee AI systems? We argue that simply having a “human in (or on) the loop” is insufficient — oversight must have the capacity, information, and authority to actually intervene. Achieving this and operationalizing it is far from trivial.
With colleagues at CERTAIN (Center for European Research in Trusted AI) and CPEC, we’ve developed interdisciplinary frameworks for understanding and operationalizing effective oversight, drawing on ideas from moral responsibility, signal detection theory, legal analysis, and empirical studies of human-AI interaction.
Key publications:
- Sterz, Baum et al. (2024). “On the Quest for Effectiveness in Human Oversight: Interdisciplinary Perspectives.” FAccT 2024
- Langer, Baum & Schlicker (2024). “Effective Human Oversight of AI-Based Systems: A Signal Detection Perspective.” Minds and Machines
- Biewer, Baum et al. (2024). “Software Doping Analysis for Human Oversight.” Formal Methods in System Design
Machine Ethics & AI Alignment
How can we build AI agents that are sensitive and responsive to normative reasons? Our work on machine ethics and AI alignment moves beyond simple rule-following toward architectures where moral considerations are genuinely integrated into decision-making — touching questions of AI safety along the way.
The neuro-symbolic GRACE architecture (“Governor for Reason-Aligned Containment”, developed with quite some colleagues) implements reason-based decision-making in RL agents, addressing what we call the “flattening problem” — the tendency of standard approaches to collapse normative distinctions.
We also work on the conceptual foundations of AI alignment and AI safety, distinguishing them from adjacent fields (AI ethics, governance) and analyzing different alignment strategies. For a longer exposition of the ideas behind this work, see my essay The Need for Normative World Models for Real Alignment (work in progress).
Key publications:
- Jahn, Muskalla, Dargasz, Schramowski & Baum (2026). “Breaking Up with Normatively Monolithic Agency with GRACE.” IASEAI 2026
- Baum & Slavkovik (2025). “Aggregation Problems in Machine Ethics and AI Alignment.” AIES 2025
- Steingrüber & Baum (2025). “Justifications for Democratizing AI Alignment and Their Prospects.” AISoLA 2025, Springer LNCS
- Baum (2026). “Disentangling AI Alignment: A Structured Taxonomy Beyond Safety and Ethics.” AISoLA 2024 Post-Proceedings
Algorithmic Fairness & Trustworthy AI
What does “fairness” mean in algorithmic systems, and how can we operationalize it? We take a philosophically informed approach, examining how different fairness metrics encode different normative commitments — and how practitioners should navigate these choices.
More broadly, we’re interested in the cluster of properties called “Trustworthy AI” (which are often in conflict, spawning trade-offs and, thus, the necessity for positive normative choice), how they can be meaningfully managed under conditions of normative uncertainty, and ultimately can be certified and assessed. With colleagues, I’ve developed the Trustworthiness Assessment Model (TrAM) shedding light on the trustworthiness of AI systems.
Key publications:
- Schlicker, Baum et al. (2025). “How Do We Assess the Trustworthiness of AI? Introducing the Trustworthiness Assessment Model (TrAM).” Computers in Human Behavior
- Baum et al. (2025). “Taming the AI Monster: Monitoring of Individual Fairness for Effective Human Oversight.” SPIN 2024, Springer LNCS
Explainability & Perspicuous Computing
Explainability isn’t just a technical challenge — it’s a normative one. What explanations are owed, to whom, and why? We’ve argued that XAI should be understood through the lens of reason-giving: explanations serve to provide the reasons that justify a system’s outputs.
I’m a member of the Transregional Collaborative Research Centre 248 “Foundations of Perspicuous Software Systems” (CPEC) and remain closely involved with the Explainable Intelligent Systems (EIS) project.
Key publications:
- Langer et al. (2021). “What Do We Want from Explainable Artificial Intelligence (XAI)? – A Stakeholder Perspective.” Artificial Intelligence
- Baum, Mantel, Schmidt & Speith (2022). “From Responsibility to Reason-Giving Explainable Artificial Intelligence.” Philosophy & Technology
- Baum, Hermanns & Speith (2019). “Towards a Framework Combining Machine Ethics and Machine Explainability.” CREST 2018
Major Projects
Current:
- CERTAIN — Center for European Research in Trusted AI (Executive Board member)
- MAC-MERLin — Multi-level Abstractions on Causal Modelling for Enhanced Reinforcement Learning
- TRR 248 Foundations of Perspicuous Software Systems” (CPEC) — Center for Perspicuous Computing
Past:
- Explainable Intelligent Systems (EIS) — Volkswagen Foundation (2019–2024)
PhD Supervision
As a Saarland University Associate Fellow (since December 2025), I have full PhD supervision rights in Computer Science. I welcome inquiries from prospective doctoral students interested in:
- Machine ethics and normative reasoning in AI
- AI alignment (technical and philosophical perspectives)
- Human oversight and human-AI interaction
- Algorithmic fairness (conceptual and technical)
- Philosophy of AI
Contact: academia@kevinbaum.de