A statistical framework for weak-to-strong generalization

  • With Felipe Maia Polo, Moulinath Banerjee, Ya’acov Ritov, Mikhail Yurochkin, Yuekai Sun

  • In Submission, 2024

  • Abstract: Modern large language model (LLM) alignment techniques rely on human feedback, but it is unclear whether the techniques fundamentally limit the capabilities of aligned LLMs. In particular, it is unclear whether it is possible to align (stronger) LLMs with superhuman capabilities with (weaker) human feedback without degrading their capabilities. This is an instance of the weak-to-strong generalization problem: using weaker (less capable) feedback to train a stronger (more capable) model. We prove that weak-to-strong generalization is possible by eliciting latent knowledge from pre-trained LLMs. In particular, we cast the weak-to-strong generalization problem as a transfer learning problem in which we wish to transfer a latent concept from a weak model to a strong pre-trained model. We prove that a naive fine-tuning approach suffers from fundamental limitations, but an alternative refinement-based approach suggested by the problem structure provably overcomes the limitations of fine-tuning. Finally, we demonstrate the practical applicability of the refinement approach with three LLM alignment tasks.

All the lambda one’s on cyclic admissible covers

  • With Byson Owens, Renzo Cavalieri

  • To Appear, Proceedings of the American Mathematical Society, 2024

  • Abstract: We compute the degree of Hurwitz-Hodge classes on one dimensional moduli spaces of cyclic admissible covers of the projective line. We also compute the degree of the the first Chern class of the Hodge bundle lambda one for all one dimensional moduli spaces. In higher dimension, we express the divisor class lambda one as a linear combination of psi classes and boundary strata.

Algorithmic Fairness in Performative Policy Learning: Escaping the Impossibility of Group Fairness

  • With Ya’acov Ritov, Yuekai Sun

  • In ACM Conference on Fairness, Accountability, and Transparency (FAccT), 2024

  • Abstract: In many prediction problems, the predictive model affects the distribution of the prediction target. This phenomenon is known as performativity and is often caused by the behavior of individuals with vested interests in the outcome of the predictive model. Although performativity is generally problematic because it manifests as distribution shifts, we develop algorithmic fairness practices that leverage performativity to achieve stronger group fairness guarantees in social classification problems (compared to what is achievable in non-performative settings). In particular, we leverage the policymaker’s ability to steer the population to remedy inequities in the long term. A crucial benefit of this approach is that it is possible to resolve the incompatibilities between conflicting group fairness definitions.

Learning In reverse Causal Strategic Environments with Ramifications on Two Sided Markers

  • With Yuekai Sun, Ya’acov Ritov

  • In International Conference on Learning Representations (ICLR), 2024

  • Abstract: Motivated by equilibrium models of labor markets, we develop a formulation of causal strategic classification in which strategic agents can directly manipulate their outcomes. As an application, we compare employers that anticipate the strategic response of a labor force with employers that do not. We show through a combination of theory and experiment that employers with performatively optimal hiring policies improve employer reward, labor force skill level, and in some cases labor force equity. On the other hand, we demonstrate that performative employers harm labor force utility and fail to prevent discrimination in other cases.

Boundary Expression for Chern Classes of the Hodge Bundle on Spaces of Cyclic Covers

  • With Bryson Owens

  • In Involve: A journal of mathematics, 2019

  • Abstract: We compute an explicit formula for the first Chern class of the Hodge Bundle over the space of admissible cyclic degree three covers of n-pointed rational stable curves as a linear combination of boundary strata. We then apply this formula to give a recursive formula for calculating certain Hodge integrals containing lambda one. We also consider degree two covers for which we compute the divisor class lambda two as a linear combination of codimension two boundary strata.

The Supersingularity of Hurwitz Curves

  • With Dean Bisogno, Erin Dawson, Henry Frauenhoff, Michael Lynch, Amethyst Price, Rachel Pries, Eric Work

  • In Involve: A journal of mathematics, 2018

  • Abstract: We study when Hurwitz curves are supersingular. Specifically, we find sufficient conditions for the Hurwitz curve with n and l relatively prime, to be supersingular over the finite field of size p. Further, we provide a complete table of supersingular Hurwitz curves of genus less than 5 for characteristic less than 37.