Welcome to the Improved Preference Optimization research guide!

This guide is designed to help you develop innovative methods for aligning AI preferences with human values. It's divided into two phases: Ideation (brainstorming and conceptualizing) and Implementation & Evaluation (building, testing, and refining).

We'll use a prompt-led approach, answer the questions in each phase one at a time, keeping your responses concise to build ideas step by step. This structured questioning helps foster creativity while staying focused.

At the end, you'll find general resources to support your work.


Phase 1: Prompt-Led Ideation for a New Preference Optimization Method

In this phase, focus on ideation: Explore internal model signals that could reflect AI preferences, brainstorm ways to interpret and intervene on them, and anticipate potential pitfalls. Answer the questions sequentially to build a solid conceptual foundation.