The Linear Tree Effect (Varying-Coefficient Models)¶

Navigation:

Theory introduction: See the Intro
Related code architecture: See the Code Architecture

Formula Definition: Varying-Coefficient Networks¶

By definition, a standard TreeEffect constructs a piecewise-constant response surface. It perfectly isolates spatial jumps and structural breaks but cannot calculate local gradients or continuous linear slopes within its terminal leaves.

To build Piecewise Continuous Regressions, the TAM framework elevates the “Linear Tree” concept into a rigorous Varying-Coefficient (VC) Model [Hastie and Tibshirani, 1993]. The LinearTreeEffect (lt) explicitly decouples the variables defining the partitions/geography (\(x_{part}\)) from the continuous variables defining the local slopes (\(x_{slope}\)).

Mathematically, it encapsulates a base tree (acting as the local intercept \(\beta_0(x_{part})\)) and a tensor product of a slope tree with a linear effect (acting as the varying gradient \(\beta_1(x_{part})\)). The explicit Primal feature mapping geometrically concatenates these two spaces:

\[\phi_{lt}(x_{part}, x_{slope}) = \left[ \phi_{tree_{base}}(x_{part}), \quad \phi_{tree_{slope}}(x_{part}) \otimes \phi_{lin}(x_{slope}) \right]^\top\]

Optimal Penalization¶

To regularize this composite structure, the penalty matrix is constructed as a block-diagonal encapsulation of its internal components.

The local intercepts are bounded by the Anisotropic Sparsity-Adaptive Ridge penalty of the base tree (\(P_{tree_{base}}\)), while the local slopes are penalized by the anisotropic Kronecker penalty of the tensor product (\(P_{cross}\)):

\[\begin{split}P_{lt} = \begin{bmatrix} P_{tree_{base}} & 0 \\ 0 & P_{cross} \end{bmatrix} = \text{diag}(P_{tree_{base}}, P_{cross})\end{split}\]

This nested anisotropic structure is mathematically powerful. Not only can the varying coefficients (\(\beta_1(x_{part})\)) be penalized independently from the local intercepts (\(\beta_0(x_{part})\)), but by activating the sparsity hyperparameter (\(\alpha_{sp} > 0\)), the base tree penalties dynamically adapt to the empirical data density (\(C_i\)) of each specific leaf. Empty or starved spatial regions receive geometrically massive penalties on both their local intercept and their local slope, heavily shrinking wild linear extrapolations back to zero.

Theoretical Critique: Resolving MOB Singularities¶

In classical statistical literature, “Linear Trees” (such as M5 or MOB) rely on Model-Based Recursive Partitioning [Zeileis et al., 2008]. These algorithms attempt to fit an unpenalized local Ordinary Least Squares (OLS) regression strictly inside every individual leaf.

The Singularity Flaw: As Zeileis et al. highlight, this recursive partitioning frequently triggers catastrophic matrix singularities. If a specific spatial partition (leaf) is starved of temporal data or lacks variance in the \(x_{slope}\) dimension, its local covariance matrix (\(X^\top X\)) becomes non-invertible, crashing the algorithm [Zeileis et al., 2008].

The TAM Solution: By formulating the varying-coefficient model globally via the Primal tensor product (\(\Phi_{tree} \otimes \Phi_{lin}\)), TAM entirely bypasses the localized OLS problem. The explicit structural penalty \(P_{cross}\) strictly dominates any empty or low-variance subspace. If a geographic region lacks sufficient data to resolve a local slope, the solver smoothly shrinks that unstable local gradient exactly to zero. The model safely falls back exclusively on the global linear trend, mathematically guaranteeing a globally full-rank system without algorithmic crashes.

RKHS Eligibility¶

The theoretical validity of this effect relies on two fundamental closure properties of Reproducing Kernel Hilbert Spaces (RKHS) proven by Aronszajn [Aronszajn, 1950].

First, the Kronecker product (\(\otimes\)) of the slope tree and the linear projection rigorously generates a valid piecewise-continuous RKHS (closure under pointwise multiplication). Second, concatenating this joint space with the base tree space via a direct sum (\(\oplus\)) maintains the strict positive definiteness of the global space. Therefore, the LinearTreeEffect operates mathematically as a perfectly valid, finite-dimensional RKHS.

Architectural Guardrails¶

When declaring a Linear Tree, the framework enforces several architectural guardrails to guarantee mathematical stability during the exact Primal Sparse Conjugate Gradient resolution:

Collinearity Prevention (n_trees=1): The framework strictly forces the number of trees to 1. Using a massive random forest (\(B \gg 1\)) to calculate overlapping, localized linear slopes would create catastrophic multicollinearity inside the solver.
Anti-Starvation Protocol (max_leaves & split_strategy): For 1D Piecewise Regressions (e.g., lt(x, max_leaves=8)), purely random binary splits create microscopic leaves. By using max_leaves, the framework bypasses Monte Carlo sampling to enforce deterministic intervals. Depending on the split_strategy, it applies either mathematically orthogonal Cartesian grids (uniform) or density-adaptive copulas (quantile) to perfectly balance the data distribution and guarantee full matrix rank.
Spatial Local Slopes (max_depth): For multi-dimensional interactions (e.g., lt(lat, others='lon', slope='temp', max_depth=4)), it uses an oblivious tree on the coordinates to construct a spatial grid, applying the Kronecker product to fit a unique, regularized varying-coefficient for temperature inside every geographic zone.

Universal Extrapolation¶

Because TAM isolates the algorithmic partition logic from the continuous geometric space by embedding the discrete leaves as finite Primal blocks, the LinearTreeEffect is mathematically capable of extrapolating outside its training distribution.

By using the extrapolate parameter (e.g., 'linear' or 'saturation'), the framework can force the piecewise linear tree to project a smooth, stable slope infinitely outside the \([-1, 1]\) bounding box, completely overcoming the classic limitation of standard Decision Trees.