14 Nov

Interdisciplinary Workshop on Preferences in Artificial Intelligence

Date:

Fri:
9:00 am - 6:00 pm

14 November 2025

Location:

ZEPP Library, LMU main building Geschwister-Scholl-Platz 1 Room M210 80539 München

This workshop is organized jointly by the Munich Center for Mathematical Philosophy and the Chair of Artificial Intelligence and Machine Learning, both at LMU Munich. The workshop is generously supported by the Konrad Zuse School of Excellence in Reliable AI (relAI) and by the Bavarian Ministry for Digital Affairs.

Idea and Motivation

The notion of “preference” has a long tradition in various scientific disciplines, including economics and the social sciences, operations research and the decision sciences, psychology, and philosophy. Over the past decades, it has also been studied intensively in artificial intelligence (AI), where preferences provide a means for specifying desires in a declarative and intelligible way, a point of critical importance for effective knowledge representation and reasoning. Moreover, recommender systems often aim to learn user preferences to provide better recommendations. And, recently, alignment methods aim to improve Large Language Models to generate output that is closer to human preference. This workshop brings together researchers from different disciplines interested in preferences. The goal is to stimulate the interaction between these disciplines, to deepen the understanding of preferences and explore new perspectives, and thereby to advance the state of the art of preferences in AI.

Program and registration

Program

09:00 – 09:30: Welcome and opening remarks

09:30 – 10:00: Jobst Heitzig (Potsdam Institute for Climate Impact Research)
10:00 – 10:30: Zoi Terzopoulou (GATE, Jean Monnet University)
10:30 – 11:00: Christian List (LMU/MCMP)

11:00 – 11:30: Coffee break

11:30 – 12:00: Johannes Fürnkranz (JKU Linz)
12:00 – 12:30: Arduin Findeis (Cambridge)
12:30 – 13:00: Julian Rodemann (CISPA/LMU)

13:00 – 14:30: Lunch break

14:30 – 15:00: Lihi Dery (Ariel University)
15:00 – 15:30: Ignacio Ojea Quintana (LMU/MCMP)
15:30 – 16:00: Dominik Klein (Utrecht)

16:00 – 16:30: Coffee break

16:30 – 17:00: Paolo Viappiani (Paris Sorbonne)
17:00 – 17:30: Timo Kaufmann (LMU)

17:30 – 17:45: Closing remarks

Abstracts

Jobst Heitzig (Potsdam Institute for Climate Impact Research): Model-Based Soft Maximization of Suitable Metrics of Long-Term Human Power

Power is a key concept in AI safety: power-seeking as an instrumental goal, sudden or gradual disempowerment of humans, power balance in human-AI interaction and international AI governance. At the same time, power as the ability to pursue diverse goals is essential for wellbeing.

This paper explores the idea of promoting both safety and wellbeing by forcing AI agents explicitly to empower humans and to manage the power balance between humans and AI agents in a desirable way. Using a principled, partially axiomatic approach, we design a parametrizable and decomposable objective function that represents an inequality- and risk-averse long-term aggregate of human power. It takes into account humans’ bounded rationality and social norms, and, crucially, considers a wide variety of possible human goals.

We derive algorithms for computing that metric by backward induction or approximating it via a form of multi-agent reinforcement learning from a given world model. We exemplify the consequences of (softly) maximizing this metric in a variety of paradigmatic situations and describe what instrumental sub-goals it will likely imply. Our cautious assessment is that softly maximizing suitable aggregate metrics of human power might constitute a beneficial objective for agentic AI systems that is safer than direct utility-based objectives. (Joint work with Ram Potham.)

Zoi Terzopoulou (GATE, Jean Monnet University): Learning how to vote with principles

Can neural networks be applied in voting, while satisfying the need for transparency in collective decisions? We propose a framework to build and evaluate neural networks that aggregate preferences, using the well-established axiomatic method of voting theory. We find that neural networks, despite being highly accurate, often fail to align with the core axioms of voting rules, revealing a disconnect between mimicking outcomes and reasoning. However, by optimizing axiom satisfaction, neural networks can synthesize new voting rules that often surpass existing ones of the literature. This talk is based on a recently published paper co-authored with Levin Hornischer (LMU Munich).

Christian List: Collective intelligence through aggregation

Suppose a committee, expert panel, or other group is making judgments on some issues, where these may be not just yes/no-questions, such as whether a defendant is guilty or innocent, but include variables with many possible values, such as macroeconomic or meteorological variables or travel directions. Furthermore, there may be interconnections between different issues, as in the case of economic or climate variables. How can the group arrive at “intelligent” collective judgments, based on the group members’ individual judgments? We investigate three challenges raised by this judgment-aggregation problem. First, reasonable methods of aggregation (such as defining the collective judgment for each issue as the average or median judgment) can produce inconsistent collective judgments. Secondly, many methods of aggregation are manipulable by strategic voting. Finally, not all methods of aggregation are conducive to tracking the truth on the issues in question. We prove new impossibility or possibility theorems on all three challenges, identifying what it takes to produce collective judgments in a consistent, truth-tracking, and non-manipulable manner and thereby to achieve collective intelligence through aggregation. Overall, the median method, though imperfect, performs reasonably well. We also note the relevance of our analysis for non-human group decisions. (Joint work with Franz Dietrich.)

Johannes Fürnkranz (JKU Linz): On Preference Learning in Imperfect Information Domains

In this talk, we discuss the potential of contextual preference learning in imperfect information games. In particular, we discuss whether preferences can be used to infer the true hidden game state given the current observable context, and how they should be sampled in order to optimize the training process. We illustrate this with results from the game of Reconnaissance Blind Chess, an imperfect information variant of chess.

Arduin Findeis (Cambridge): Feedback Forensics: Understanding Human Preferences for AI Personality

Conventional AI benchmarks typically focus on the content of responses, for example checking factual (e.g. MMLU ) or mathematical correctness (e.g. GSM8k). However, for many language model applications, users' preferences also heavily depend on the manner (or “personality”) of a model’s responses, for example how friendly or confident responses are. Recent issues with model releases highlight the limited ability of existing evaluation approaches to capture such preferences over personality traits: a ChatGPT model version was rolled back over sycophant personality issues, other models’ personalities have been criticised to overfit to the Chatbot Arena leaderboard.

In this talk, I will introduce Feedback Forensics: our newly released toolkit to measure AI personality traits encouraged in human preferences and exhibited by models. Using our toolkit, I will first share results detecting the personality traits currently encouraged by popular human feedback datasets (incl. Chatbot Arena). Next, I will discuss changes and trends in personality traits exhibited across model families and versions. The talk will feature a live demo of our personality visualisation tool and attendees are invited to follow along via our online platform: https://feedbackforensics.com/.

Julian Rodemann (CISPA, LMU): A Statistical Case Against Empirical Human-AI Alignment

Empirical human–AI alignment aims to make AI systems act in line with observed human behavior. While noble in its goals, we argue that empirical alignment can inadvertently introduce statistical biases that warrant caution. This talk argues against naïve empirical alignment, offering axiomatic alignment as an alternative. Our argument is based on the observation that humans are reasonably good at defining rational axioms, but are rather bad at acting according to them. We substantiate this principled argument with tangible examples like human-centric decoding of language models and regulatory enforcement of alignment.

Lihi Dery (Ariel University); Iterative and Interative Peer Assessment

Iterative peer grading activities can sustain student engagement during project presentations, but their effectiveness depends on the design of both preference elicitation and preference aggregation mechanisms. Numeric grades are easy to elicit, yet students tend to award inflated scores, producing ties and sometimes strategically lowering competitors’ grades. Full rankings avoid this inflation but impose heavy cognitive demands on students. We present a peer grading model that combines the ease of numeric grading with the discriminatory power of ranking. It integrates (a) a preference elicitation algorithm that structures how students provide evaluations and (b) a median-based voting protocol that aggregates these preferences into a ranked order with fewer ties. A classroom deployment demonstrated that this approach reduced grade inflation and strategic bias while lowering the cognitive and communicative burden on students.

Ignacio Ojea Quintana (LMU/MCMP): The Reward Puzzle in Recommender Systems (with Silvia Milano)

Recommender Systems (RS) are an ubiquitous technology affecting consumption, social relations, news information and many other important aspects of our lives. They are usually justified as a technology that identifies and maximizes users’ preferences, by suggesting items (news, products, people) that bring the most utility to them. More generally, they are conceived as estimating a stationary distribution associated with the users’ preferences. We believe this conceptualization of users’ utilities or rewards is misguided, and we build on the ‘reward paradox’ in reinforcement learning to illustrate the problem: the utility that a user gets from an item depends not only on exogenous variables like the item and the context in which is suggested, but also endogenous variables like mental, cognitive, or other inner states. We then model the RS task as an interaction between two agents, a recommender and a user agent. This allows us to provide more psychological depth into the user agent, and we will show some preliminary results in that direction.

Dominik Klein (Utrecht): Deliberation and consensus

Collective preferences are frequently formed through deliberation, ideally leading to consensus among all participants. Where such deliberation is meant to be emulated or tracked with an AI system, the underlying process must be specified formally. Although there has been a significant interest in axiomatic approaches to voting, deliberation and consensus are rarely studied in an axiomatic way. We consider several standard and new axioms for deliberation and consensus, and study whether deliberation and consensus conform to these axioms. Instead of exploring general impossibility results, we consider a specific formal model of deliberation: the bounded confidence model by Hegselmann and Krause. We show that many axioms of deliberation and consensus are violated in this formal model of deliberation. We go on to argue that the violation of these axioms will likely occur in a wide class of models of deliberation and is not an artefact of the bounded confidence model. (Joint work with Hein Duijf.)

Paolo Viappiani (Paris Sorbonne): New Directions in Reasoning with Partial Preference Models and Elicitation

In many situations it is necessary to make decisions with incomplete or uncertain preference information. In this talk I will summarize the main approaches to reason with partial preference models and frameworks for interactive preference elicitation that iteratively ask questions to the user until a recommendation can be made with a certain degree of confidence. I will then discuss some recent applications of preference elicitation, such as computational social choice, social ranking, and algorithmic recourse.

Timo Kaufmann (LMU): ResponseRank: Data-Efficient Reward Modeling through Preference Strength Learning

Binary choices, as often used for reinforcement learning from human feedback (RLHF), convey only the direction of a preference. A person may choose apples over oranges and bananas over grapes, but which preference is stronger? Strength is crucial for decision-making under uncertainty and generalization of preference models, but hard to measure reliably. Metadata such as response times and inter-annotator agreement can serve as proxies for strength, but are often noisy and confounded. We propose ResponseRank to address the challenge of learning from noisy strength signals. Our method uses relative differences in these signals to rank responses to pairwise comparisons by their inferred preference strength. Signals are only considered locally within carefully constructed strata, controlling for systemic variation. This enables robust learning of utility differences consistent with strength-derived rankings, all while making minimal assumptions.

Organizers

Registration

Attendance is free but registration is required by sending an email to Office.Leitgeb@lrz.uni-muenchen.de.

Venue

ZEPP Library, LMU main building, room M210, Geschwister-Scholl-Platz 1, 80539 Munich