research

My research bridges philosophy and computer science, focusing on making AI systems trustworthy, fair, and aligned with human values.

Research Group: RAIME

Since October 2024, I lead the independent research group Responsible AI and Machine Ethics (RAIME) at DFKI, embedded within the Neuro-Mechanistic Modeling department.

RAIME develops approaches that integrate normative reasoning and ethical considerations into AI systems — not as post-hoc constraints, but as core component of decision-making and agentic architectures. We’re particularly interested in reinforcement learning and llm-based agents that can act for the right reasons, not just produce the right outputs.


Research Themes

Effective Human Oversight

What does it mean for humans to genuinely oversee AI systems? I argue that simply having a “human in (or on) the loop” is insufficient — oversight must be effective, meaning humans must have the capacity, information, and authority to actually intervene.

With colleagues at CERTAIN and CPEC, I’ve developed interdisciplinary frameworks for understanding and operationalizing effective oversight, drawing on ideas from moral responsibility, signal detection theory, legal analysis, and empirical studies of human-AI interaction.

Key publications:

  • Sterz, Baum et al. (2024). “On the Quest for Effectiveness in Human Oversight: Interdisciplinary Perspectives.” FAccT 2024
  • Langer, Baum & Schlicker (2024). “Effective Human Oversight of AI-Based Systems: A Signal Detection Perspective.” Minds and Machines
  • Biewer, Baum et al. (2024). “Software Doping Analysis for Human Oversight.” Formal Methods in System Design

Machine Ethics & AI Alignment

How can we build AI agents that are sensitive and responsive to normative reasons? My work on machine ethics moves beyond simple rule-following toward architectures where moral considerations are genuinely integrated into decision-making.

The neuro-symbolic GRACE architecture (“Governor for Reason-Aligned Containment, developed with quite some colleagues) implements reason-based decision-making in RL agents, addressing what we call the “flattening problem” — the tendency of standard approaches to collapse normative distinctions.

I also work on the conceptual foundations of AI alignment, distinguishing it from adjacent fields (AI ethics, AI safety) and analyzing different alignment strategies.

Key publications:

  • Jahn, Muskalla, Dargasz, Schramowski & Baum (2026). “Breaking Up with Normatively Monolithic Agency with GRACE.” IASEAI 2026
  • Baum & Slavkovik (2025). “Aggregation Problems in Machine Ethics and AI Alignment.” AIES 2025
  • Baum (2026). “Disentangling AI Alignment: A Structured Taxonomy Beyond Safety and Ethics.” AISoLA 2024 Post-Proceedings
  • Baum, Mantel, Schmidt & Speith (2022). “From Responsibility to Reason-Giving Explainable Artificial Intelligence.” Philosophy & Technology

Algorithmic Fairness & Trustworthy AI

What does “fairness” mean in algorithmic systems, and how can we operationalize it? I take a philosophically informed approach, examining how different fairness metrics encode different normative commitments — and how practitioners should navigate these choices.

More broadly, I’m interested in the cluster of properties called “Trustworthy AI” (which are often in conflict, spawning trade-offs and, thus, the necessity for positive normative choice), how they can be meaningfully managed under conditions of normative uncertainty, and ultimately can be certified and assessed. With colleagues, I’ve developed the Trustworthiness Assessment Model (TrAM) shedding light on the trustworthiness of AI systems.

Key publications:

  • Schlicker, Baum et al. (2025). “How Do We Assess the Trustworthiness of AI? Introducing the Trustworthiness Assessment Model (TrAM).” Computers in Human Behavior
  • Baum et al. (2025). “Taming the AI Monster: Monitoring of Individual Fairness for Effective Human Oversight.” SPIN 2024

Explainability & Perspicuous Computing

Explainability isn’t just a technical challenge — it’s a normative one. What explanations are owed, to whom, and why? I’ve argued that XAI should be understood through the lens of reason-giving: explanations serve to provide the reasons that justify a system’s outputs.

I’m a member of the Transregional Collaborative Research Centre 248 “Foundations of Perspicuous Software Systems” (CPEC) and remain closely involved with the Explainable Intelligent Systems (EIS) project.

Key publications:

  • Langer et al. (2021). “What Do We Want from Explainable Artificial Intelligence (XAI)? – A Stakeholder Perspective.” Artificial Intelligence
  • Baum, Hermanns & Speith (2018). “Towards a Framework Combining Machine Ethics and Machine Explainability.” CREST 2018

Major Projects

Current:

Past:


PhD Supervision

As a Saarland University Associate Fellow (since December 2025), I have full PhD supervision rights in Computer Science. I welcome inquiries from prospective doctoral students interested in:

  • Machine ethics and normative reasoning in AI
  • Human oversight and human-AI interaction
  • Algorithmic fairness (conceptual and technical)
  • Philosophy of AI and alignment

Contact: academia@kevinbaum.de