Adapting, Fast and Slow: A Causal Approach to Few-Shot Sequence Learning

Kasra Jalaldoust1, Elias Bareinboim1
1Causal Artificial Intelligence Lab, Columbia University
May, 2025

Manuscript Code Cite
Fast-Slow Adaptation Teaser

Abstract

Generalization to unseen target domains is a fundamental challenge in machine learning [1-3]. Our work introduces a **causal framework for supervised domain adaptation**, specifically addressing few-shot sequence learning [1, 2]. We investigate scenarios where ample source data is complemented by limited target data, a common setting in supervised domain adaptation (DA) [1, 2, 4, 5]. By combining both **causal structure-informed and structure-agnostic procedures**, we precisely characterize the conditions under which zero-shot or few-shot generalization becomes feasible [1, 2, 6, 7].

A key insight is that **generalization to an unseen target domain is inherently impossible without asserting a causal structure** that constrains the relationship between source and target domains [1, 2, 5, 8]. We extend our findings to sequential prediction tasks, demonstrating how knowledge of complex causal structure allows our structure-informed procedure to learn modular predictors from diverse source domains and systematically recompose them for faster adaptation in the target domain [1, 2, 7, 9]. Notably, we show that our structure-agnostic approach can achieve similarly fast rates in these scenarios [1, 2, 7, 9]. Our results provide a **causal theoretical basis for data-driven domain adaptation** and empirically corroborate these findings [1, 2].

The Challenge of Generalization

Traditional machine learning performance guarantees assume that the target domain, where a solution is evaluated, has an identical data distribution to the source domain used for training [3, 4]. However, even minor qualitative differences between source and target domains can severely impact performance, a problem broadly known as **distribution shift** or, in a scientific context, generalizability or external validity [3, 4].

In this context, **domain generalization** refers to situations where the learner only has access to large data from source domains and no target data [3, 4]. Our work focuses on the **domain adaptation (DA) problem**, a less extreme case where a small amount of target data is also available [4, 5]. The theoretical challenge in DA is not merely if learning is possible (one could always discard source data and rely solely on target data), but rather **how fast learning can occur and how best to leverage data from source domains** [10, 11].

Arbitrary differences between domains pose a significant barrier, making source data potentially useless without a defined relationship or "structure" between domains [5, 8]. Humans excel at transferring knowledge across domains, and causality is widely recognized as central to human understanding and decision-making, especially in changing circumstances [10-18]. Principles of generalization to the unseen from a causal perspective have been extensively studied under "transportability" and "statistical invariances" rooted in an implicit causal structure [1, 10, 11, 19-27].

Our Core Idea: Causal Structure for Adaptation

We seek to characterize when and how certain aspects of source data are generalizable, enabling **fast adaptation** (zero-shot/few-shot learning) versus when source data might hinder learning, leading to **slow adaptation** [6, 11]. Our approach hinges on the fundamental role of an underlying causal structure [6, 7].

Key Contributions & What We Offer:

Validating Our Framework: Empirical Evaluation

We evaluated the two-stage adaptation method in both multi-cause and sequential settings, using synthetic data where sequences represent functional programs [45, 52, 53]. Experiments, typically with a single source domain and sequences of length 10, corroborated our theoretical results [45, 52].

Key Takeaways

Our paper introduces a **causal framework** for supervised domain adaptation that offers both structure-informed and structure-agnostic algorithms [59, 60]. We demonstrate that **causal structure is critical** for identifying model components that can be reliably transported across domains [59, 60]. Even without explicit structural knowledge, our agnostic procedures can achieve **near-optimal performance** [59, 60]. Finally, the developed **two-stage learning procedure** provides a computationally tractable alternative that is theoretically equivalent to an exhaustive agnostic procedure [59, 60]. This work lays a causal theoretical foundation for data-driven domain adaptation through a unifying structure-agnostic scheme [1, 2].

Understanding with an Analogy: The LEGO Builder

Imagine you are a master LEGO builder. You have a huge collection of instruction manuals (source data) for many different LEGO sets (source domains), but each manual builds something slightly different. Now, someone gives you a very small, incomplete instruction manual for a new, unique LEGO model (target data), and asks you to build it as fast as possible.

Our work is like understanding the underlying principles of LEGOs (the **causal structure**). Instead of treating each manual as entirely separate, you realize that certain smaller sections of instructions (modular predictors) are actually identical or function in the same way, even if they appear in different manuals or build different parts of a model.

Ultimately, knowing the fundamental ways LEGOs connect (causal mechanisms) is key to building new and complex models quickly, even when you only have a few new instructions.

Acknowledgments

This research is supported in part by the NSF, ONR, AFOSR, DoE, Amazon, JP Morgan, and The Alfred P. Sloan Foundation [24, 62].

Additional Resources

For a more in-depth understanding of the theoretical underpinnings and empirical results, please refer to the complete PDF document of the paper, which includes comprehensive supplementary material, detailed analyses, and experimental setups [63, 64]. The appendices cover:


This page was built using the Academic Project Page Template by Eliahu Horwitz, adopted from the Nerfies project page [70, 71].

← Back to main page