Theoretical Agent Foundations

Welcome to the Agent Foundations Theory research guide!

This track seeks robust yet simple mathematical definitions for fundamental AI agent concepts such as 'wanting', 'agency', 'learning', and 'preference'. These definitions should satisfy all our theoretical desiderata while remaining practical and biologically plausible—reflecting mechanisms that could realistically exist in the human brain.

You'll work on formalizing core agent concepts, developing mathematical frameworks that handle real-world constraints, and testing whether your theoretical proposals lead to more robust AI systems. The field is highly fragmented across different researchers, so part of your challenge is synthesizing insights from scattered work into coherent approaches.

This guide is organized around five fundamental problems in agent foundations. Start with the problem that resonates most with your background, then explore connections between different areas. Approach this iteratively: formalize, test mathematical properties, check biological plausibility, and always critically evaluate your assumptions.

Introduction to Agent Foundations

Agent Foundations research aims to build the mathematical infrastructure needed for AI alignment. We need precise formal definitions of concepts like agency, preference, and learning that work in realistic scenarios with computational constraints, embedded agents, and uncertain environments.

The field draws from theoretical computer science, decision theory, philosophy of mind, and computational neuroscience. Unlike empirical AI safety work that studies existing systems, this track develops the theoretical foundations that future alignment work will build upon.

Key challenges include handling computational limitations (real agents can't run AIXI), providing frequentist rather than just Bayesian guarantees, dealing with agents embedded in their environments, and defining what qualifies as an agent in the first place.

Track Description

Unlike approaches that study alignment in existing AI systems, this track builds the mathematical foundations from scratch. We start with idealized agent models like AIXI and identify where they break down in realistic scenarios, then develop new theoretical frameworks that maintain the insights while solving the problems.

Key techniques include:

Formal mathematical modeling of agent behavior
Complexity-theoretic analysis of learning algorithms
Decision-theoretic frameworks for embedded agents
Information-theoretic approaches to preference learning
Logical foundations for self-modifying systems

Focus on mathematical rigor combined with practical relevance—your theoretical work should eventually inform how we build aligned AI systems.