π Documentation Generation Guide (TAM)ΒΆ
This guide explains how to build the Time series Additive Model (TAM) technical documentation from scratch on a local machine.
Our βDocs as Codeβ pipeline generates three distinct products from a single source of truth:
π A Comprehensive HTML Website (Includes everything: API, Theory, and Code).
π§ A Mathematical Theory PDF (Tailored for researchers and academic publications).
π» An Architecture & Code PDF (Tailored for software engineers and auditors).
PrerequisitesΒΆ
Before generating the documentation, ensure you have the following installed:
Python 3.10+
A LaTeX Distribution (Required for PDFs):
Windows: Download and install MiKTeX for Windows.
Important: During MiKTeX installation, ensure the option βInstall missing packages on-the-flyβ is set to βYesβ. Sphinx will automatically attempt to download the LaTeX packages it needs during the build.
Creating the WorkspaceΒΆ
To avoid conflicts with other Python projects, we will create an isolated virtual environment.
Open a terminal (Command Prompt cmd, PowerShell, or your OS equivalent) at the root of the project and run:
python -m venv .venv
(Note: If python is not in your PATH, provide the full path to your Python executable, e.g., C:/Users/USER/AppData/Local/Programs/Python/Python312/python.exe -m venv .venv)
This creates a .venv/ folder containing a clean Python installation.
Activating the EnvironmentΒΆ
Activate the environment to work inside it.
On Windows (CMD):
.venv\Scripts\activate.bat
On Windows (PowerShell):
.\.venv\Scripts\Activate.ps1
On Mac/Linux:
source .venv/bin/activate
β Once activated, you should see
(.venv)displayed at the beginning of your command line prompt.
Installing DependenciesΒΆ
We need to install Sphinx (the documentation generator), the MyST Parser (for Markdown support), BibTeX (for academic citations), and the TAM package itself (so Sphinx can read the Python docstrings).
Run this command at the root of the project:
pip install -r requirements.txt
(Note: The repositoryβs requirements.txt is already configured with sphinx, myst-parser, sphinxcontrib-bibtex, sphinx-autodoc-typehints and all necessary TAM dependencies.)
Building the DocumentationΒΆ
Once the installation is complete, run the automation script located in your working directory:
On Windows:
.\build_docs.bat
On Mac/Linux: (Assuming you have an equivalent shell script)
./build_docs.sh
This script will automatically:
Clean previous builds to prevent caching issues.
Scan the Python source code (
src/tam) to auto-generate the API reference.Build the global HTML website.
Build the Mathematical Theory LaTeX files and compile them into a PDF.
Build the Architecture & Code LaTeX files and compile them into a PDF.
Viewing the ResultsΒΆ
If the generation is successful, you will find your two distinct documentation formats in the docs/build/ directory. You can open them directly:
HTML π Interactive Website:
docs/build/html/index.htmlPDF π§ Theory (Researchers) & π» Architecture (Engineers):
docs/build/latex_theory/TAM_documentation.pdf
6. Troubleshooting PDF Generation (Corporate Environments)ΒΆ
Building the interactive HTML website is entirely handled by Python and Sphinx. However, generating the academic and architectural PDFs involves a more complex pipeline relying on a LaTeX engine (like MiKTeX).
In a corporate IT environment, strict network firewalls and restricted user privileges can disrupt this pipeline. To successfully generate the PDFs, your system requires three key elements:
1. System PATH Recognition
The terminal running the build script must be able to locate the LaTeX compiler. If you just installed your LaTeX distribution, you must completely restart your terminal or IDE (like VSCode) so it can recognize commands like pdflatex.
2. Dynamic Package Downloading (βOn-the-flyβ)
To format the PDFs correctly, Sphinx relies on specific LaTeX styling packages (such as cmap.sty, times, fncychap, tabulary, etc.). Your LaTeX distribution must be configured to download and install missing packages automatically in the background during the build process.
3. Network Access and User Permissions This is the most common bottleneck in enterprise environments. Automatic background downloads often fail because:
Admin Rights: Background installations might be blocked if you lack administrator privileges. You can bypass this by ensuring your LaTeX package manager (e.g., MiKTeX Console) is set to operate strictly in βUser modeβ, which installs packages locally in your
AppDatafolder.Firewalls & Proxies: Corporate firewalls may block connections to default LaTeX servers, resulting in βTimeoutβ errors. If this happens, you will need to open your LaTeX console, change the remote package repository to an alternative mirror (e.g., an HTTPS server in your country), or configure your companyβs proxy settings directly within the LaTeX console to allow manual package installations.
ποΈ Quick Guide: How to ContributeΒΆ
To maintain crystalline clarity and respond to two very different audiences (researchers/mathematicians vs. software engineers), TAM uses a strict Mirror Architecture for its documentation.
We divide the documentation into two sealed worlds. When you add a new feature, you must write a pair of files:
π§
math/(The Brain): Explains why the formula is exact. Contains theory, LaTeX equations, theorems, and academic citations.π»
architecture/(The Hands): Explains how the Python script calculates the formula without crashing. Contains PyTorch implementation details, OOP structure, VRAM management, and{literalinclude}code extraction.
ποΈ The Golden Rules of Writing (Sphinx/MyST)ΒΆ
To ensure the pipelines compile our PDFs and HTML flawlessly, all contributors must adhere to these strict writing rules:
Separation of Code Comments vs. Markdown Theory: Because the
architecture/Markdown files dynamically pull source code via{literalinclude}, your Python docstrings (r"""...""") and inline comments must focus strictly on software engineering (e.g., tensor shapes, VRAM allocation, OOM prevention, PyTorch workarounds). Do not write LaTeX mathematical proofs or academic citations inside the.pyfiles. Let themath/Markdown files carry that burden.Zero Redundancy: The
architecture/files must never re-demonstrate the math. Instead, use clean relative links to point to the theory (e.g.,[See the theory](../../math/core/01_primal_model.md)).Code Extraction: Do not hard-copy and paste PyTorch code into Markdown files. Exclusively use the Sphinx
{literalinclude}directive with exact relative paths (e.g.,../../../../src/tam/...) and Python comment tags (#: <tag>and#: </tag>) to pull code dynamically.Academic Citations: Any bibliographic reference to justify scientific work must use the MyST formalism
{cite:p}\bibtex_key`. Ensure you add the corresponding entry toreferences.bib` at the root of the project so it compiles in the PDFs.No Emojis in PDFs: Ban emojis in titles and the body of text intended for the PDFs (
math/andarchitecture/folders) to avoid fatalpdflatexcompilation errors. Emojis are only permitted in the globalREADME.md.
π Annotated Directory Structure (Mapping .md β .py)ΒΆ
TAM/
β
βββ README.md # Home (Auto-copied by Sphinx)
βββ paper.md # JOSS Paper (Independent from Sphinx)
β
βββ src/tam/ # π SOURCE CODE
β
ββββ # π DOCUMENTATION (Sphinx)
β
βββ math/ # π§ THE "WHY" (Theory & Equations)
β β
β βββ core/ # -> Fundamental equations of the solver
β β βββ 01_primal_model.md # Representer Theorem, Aronszajn.
β β β # Scope scripts: _base.py, additive.py
β β βββ 02_tensorization.md # N-Dim Broadcasting, temporal/group independence.
β β β # Scope scripts: _data.py, _math.py
β β βββ 03_linear_system.md # Linear algebra (Cholesky vs Conjugate Gradient).
β β β # Scope scripts: _math.py, _dispatcher.py
β β βββ 04_complexity.md # Proof of O(N DΒ²) vs O(NΒ³) complexity.
β β β # Scope scripts: _math.py, _dispatcher.py
β β βββ 05_gcv_theory.md # Golub's trace, Tikhonov regularization.
β β # Scope scripts: _dispatcher_gcv.py
β β
β βββ spectrum/ # -> Mathematical definition of Bases (Formulas for Ξ¦ and P)
β β βββ LINEAR.md # Scope scripts: _linear.py
β β βββ SPLINES.md # Scope scripts: _spline.py
β β βββ FOURIER.md # Scope scripts: _fourier.py
β β βββ WAVELETS.md # Scope scripts: _wavelet.py
β β βββ NEURAL.md # Scope scripts: _neural.py
β β βββ PHYSICS_PIKL.md # Scope scripts: _physics.py
β β βββ RBF.md # Scope scripts: _rbf.py
β β βββ CATEGORICAL.md # Scope scripts: _categorical.py
β β βββ CHEBYSHEV.md # Scope scripts: _chebyshev.py
β β βββ TREE.md # Scope scripts: _tree.py
β β βββ LINEAR_TREE.md # Scope scripts: _linear_tree.py
β β βββ PID.md # Scope scripts: _pid.py and model/bode.py
β β βββ CROSS_TENSOR.md # Scope scripts: _tensor.py
β β
β βββ meta/ # -> Theory of Meta-Learning algorithms
β βββ 01_adaptive_online.md # Sliding windows theory and concept drift.
β β # Scope scripts: adaptive.py
β βββ 02_kalman_filter.md # Extended Kalman Filtering, Woodbury matrix identity.
β β # Scope scripts: kalman.py
β βββ 03_hierarchical_joint.md # Joint optimization under constraints (Parent = Sum).
β β # Scope scripts: hierarchical.py
β βββ 04_conformal_safety.md # Conformal prediction (Split, ACI by Gibbs & CandΓ¨s).
β β # Scope scripts: safety.py
β βββ 05_opera_aggregation.md # Expert aggregation, regret bounds, Cesa-Bianchi.
β β # Scope scripts: opera.py
β βββ 06_deep_gam_backfitting.md # Orthogonal backfitting per group (Hybridization).
β β # Scope scripts: neural.py (DeepGAM)
β βββ 07_statistical_diagnostics.md # T-tests, Bootstrap, Degrees of freedom.
β β # Scope scripts: diagnostics.py
β βββ 08_auto_orchestrator.md # Evolutionary AutoML, EDA, Hub-and-Spoke, Parsimony.
β β # Scope: auto_tam.py, drag_tam.py, knowledge_graph.py, population_nodes.py, evolution_reporter.py
β β # Pipeline Scope: context.py, data_manager.py, base_discoverer.py, expert_expander.py, ensemble_selector.py
β βββ 09_auto_data_topology.md # Data topology, Krylov stability, Covariate Lock, Panel Data bounds.
β β # Scope: data_profiler.py, feature_engineer.py, effect_selector.py, parser.py
β βββ 10_mlops_evaluation.md # Theory of empirical metrics, SMAPE, and Temporal Degradation.
β # Scope scripts: metrics.py, performance_analyzer.py
β
βββ architecture/ # π» THE "HOW" (Code, PyTorch & API)
β
βββ core/ # -> Implementation of the core engine
β βββ 01_additive_api.md # Main class and OOP construction.
β β # Scope scripts: additive.py, _base.py, _factory.py, _base_effects.py, utils.py
β βββ 02_data_pipeline.md # Normalization, Padding, transformations.
β β # Scope scripts: _data.py
β βββ 03_math_dispatcher.md # PyTorch routing (Direct Cholesky vs Sparse CG).
β β # Scope scripts: _dispatcher.py, _math.py
β βββ 04_hardware_memory.md # Anti-OOM systems, RAM/VRAM estimation.
β β # Scope scripts: hardware.py, _memory.py, _dispatcher.py (catch OOM), _tree.py (sparse COO), utils.py
β βββ 05_gcv_implementation.md # Discrete coordinate descent and block matrices.
β β # Scope scripts: _dispatcher_gcv.py
β βββ 06_the_spectrum_api.md # Spectrum of core mathematic projection basis.
β # Scope scripts contained in model/spectrum folder
β
βββ meta/ # -> Implementation of Wrappers / Meta-Models
βββ 01_adaptive_code.md # Vectorized sliding windows.
β # Scope scripts: adaptive.py, _data.py (_transform_data_adaptive)
βββ 02_kalman_torchscript.md # @torch.jit.script optimization and block updates.
β # Scope scripts: kalman.py
βββ 03_hierarchical_code.md # Creation of global sparse L^T L loss matrices.
β # Scope scripts: hierarchical.py
βββ 04_safety_code.md # Tensor calculation of quantiles and residual tracker.
β # Scope scripts: safety.py
βββ 05_opera_gpu.md # 3D Tensor Batching (Groups, Time, Experts) on GPU.
β # Scope scripts: opera.py
βββ 06_neural_hybrid.md # Integration of nn.Sequential in the backfitting loop.
β # Scope scripts: neural.py (DeepGAM)
βββ 07_diagnostics_utils.md # Effect plots, statistical tests, and Pandas formatting.
β # Scope scripts: diagnostics.py, plotting.py, utils.py
βββ 08_auto_orchestrator_code.md # 7-Step Pipeline, Knowledge Graph, Dynamic Annealing, OPERA.
β # Scope: auto_tam.py, drag_tam.py, knowledge_graph.py, population_nodes.py, evolution_reporter.py, autotam_report_generator.py
β # Pipeline Scope: context.py, data_manager.py, base_discoverer.py, expert_expander.py, ensemble_selector.py
βββ 09_auto_data_topology_code.md # Stateful Bounds, Collinearity Filter, Regex Parser.
β # Scope: data_profiler.py, feature_engineer.py, effect_selector.py, parser.py
βββ 10_mlops_tracking_code.md # BenchmarkTracker OOP, NaN-safe metrics, Matplotlib dashboards.
# Scope scripts: tracker.py, metrics.py, plotting.py (evaluation)
Practical Example: Adding the Kalman FilterΒΆ
If you are tasked with adding documentation for the Extended Kalman Filter meta-learner, your contribution must be split exactly like this:
1. The Math File (docs/source/math/meta/02_kalman_filter.md)
Target Audience: Researchers.
Scope: Focus entirely on the Markov equations and the Woodbury matrix identity.
Requirements: Use standard LaTeX blocks for formulas (
$$...$$). Cite academic papers justifying the Extended Kalman Filter approach using{cite:p}.
2. The Architecture File (docs/source/architecture/meta/02_kalman_torchscript.md)
Target Audience: Engineers.
Scope: Focus entirely on GPU optimization and block updates.
Requirements: Explain how the
@torch.jit.scriptdecorator is used to compile the inference loop into native C++ to avoid Python GIL bottlenecks. Use{literalinclude}to extract the specific decorated function fromsrc/tam/model/kalman.py(which should only contain engineering-focused comments). Link back to the Math file for the theory.