research
My research bridges philosophy and computer science, focusing on making AI systems trustworthy, fair, and aligned with human values.
Research Group: RAIME
Since October 2024, I lead the independent research group Responsible AI and Machine Ethics (RAIME) at DFKI, embedded within the Neuro-Mechanistic Modeling department.
RAIME develops approaches that integrate normative reasoning and ethical considerations into AI systems — not as post-hoc constraints, but as core component of decision-making and agentic architectures. We’re particularly interested in reinforcement learning and llm-based agents that can act for the right reasons, not just produce the right outputs.
Research Themes
Effective Human Oversight
What does it mean for humans to genuinely oversee AI systems? I argue that simply having a “human in (or on) the loop” is insufficient — oversight must be effective, meaning humans must have the capacity, information, and authority to actually intervene.
With colleagues at CERTAIN and CPEC, I’ve developed interdisciplinary frameworks for understanding and operationalizing effective oversight, drawing on ideas from moral responsibility, signal detection theory, legal analysis, and empirical studies of human-AI interaction.
Key publications:
- Sterz, Baum et al. (2024). “On the Quest for Effectiveness in Human Oversight: Interdisciplinary Perspectives.” FAccT 2024
- Langer, Baum & Schlicker (2024). “Effective Human Oversight of AI-Based Systems: A Signal Detection Perspective.” Minds and Machines
- Biewer, Baum et al. (2024). “Software Doping Analysis for Human Oversight.” Formal Methods in System Design
Machine Ethics & AI Alignment
How can we build AI agents that are sensitive and responsive to normative reasons? My work on machine ethics moves beyond simple rule-following toward architectures where moral considerations are genuinely integrated into decision-making.
The neuro-symbolic GRACE architecture (“Governor for Reason-Aligned Containment, developed with quite some colleagues) implements reason-based decision-making in RL agents, addressing what we call the “flattening problem” — the tendency of standard approaches to collapse normative distinctions.
I also work on the conceptual foundations of AI alignment, distinguishing it from adjacent fields (AI ethics, AI safety) and analyzing different alignment strategies.
Key publications:
- Jahn, Muskalla, Dargasz, Schramowski & Baum (2026). “Breaking Up with Normatively Monolithic Agency with GRACE.” IASEAI 2026
- Baum & Slavkovik (2025). “Aggregation Problems in Machine Ethics and AI Alignment.” AIES 2025
- Baum (2026). “Disentangling AI Alignment: A Structured Taxonomy Beyond Safety and Ethics.” AISoLA 2024 Post-Proceedings
- Baum, Mantel, Schmidt & Speith (2022). “From Responsibility to Reason-Giving Explainable Artificial Intelligence.” Philosophy & Technology
Algorithmic Fairness & Trustworthy AI
What does “fairness” mean in algorithmic systems, and how can we operationalize it? I take a philosophically informed approach, examining how different fairness metrics encode different normative commitments — and how practitioners should navigate these choices.
More broadly, I’m interested in the cluster of properties called “Trustworthy AI” (which are often in conflict, spawning trade-offs and, thus, the necessity for positive normative choice), how they can be meaningfully managed under conditions of normative uncertainty, and ultimately can be certified and assessed. With colleagues, I’ve developed the Trustworthiness Assessment Model (TrAM) shedding light on the trustworthiness of AI systems.
Key publications:
- Schlicker, Baum et al. (2025). “How Do We Assess the Trustworthiness of AI? Introducing the Trustworthiness Assessment Model (TrAM).” Computers in Human Behavior
- Baum et al. (2025). “Taming the AI Monster: Monitoring of Individual Fairness for Effective Human Oversight.” SPIN 2024
Explainability & Perspicuous Computing
Explainability isn’t just a technical challenge — it’s a normative one. What explanations are owed, to whom, and why? I’ve argued that XAI should be understood through the lens of reason-giving: explanations serve to provide the reasons that justify a system’s outputs.
I’m a member of the Transregional Collaborative Research Centre 248 “Foundations of Perspicuous Software Systems” (CPEC) and remain closely involved with the Explainable Intelligent Systems (EIS) project.
Key publications:
- Langer et al. (2021). “What Do We Want from Explainable Artificial Intelligence (XAI)? – A Stakeholder Perspective.” Artificial Intelligence
- Baum, Hermanns & Speith (2018). “Towards a Framework Combining Machine Ethics and Machine Explainability.” CREST 2018
Major Projects
Current:
- CERTAIN — Center for European Research in Trusted AI (Executive Board member)
- MAC-MERLin — Multi-level Abstractions on Causal Modelling for Enhanced Reinforcement Learning
- toCERTAIN — Transfer project for CERTAIN
- TRR 248 Foundations of Perspicuous Software Systems” (CPEC) — Center for Perspicuous Computing
Past:
- Explainable Intelligent Systems (EIS) — Volkswagen Foundation (2019–2024)
PhD Supervision
As a Saarland University Associate Fellow (since December 2025), I have full PhD supervision rights in Computer Science. I welcome inquiries from prospective doctoral students interested in:
- Machine ethics and normative reasoning in AI
- Human oversight and human-AI interaction
- Algorithmic fairness (conceptual and technical)
- Philosophy of AI and alignment
Contact: academia@kevinbaum.de