Katarzyna (Kasia) Kobalczyk
Katarzyna (Kasia) Kobalczyk
Home
Publications
Contact
Light
Dark
Automatic
alignment
Preference Learning for AI Alignment: a Causal Perspective
We propose to adopt a causal framework for preference learning to define and address challenges like causal misidentification, preference heterogeneity, and crucially, confounding due to user-specific objectives.
Kasia Kobalczyk
,
Mihaela van der Schaar
May 1, 2025
openreview
arXiv
PDF
code