# πŸ“˜ Documentation Generation Guide (TAM) [⬅️ `README`](README.md) This guide explains how to build the Time series Additive Model (TAM) technical documentation from scratch on a local machine. Our "Docs as Code" pipeline generates three distinct products from a single source of truth: 1. 🌐 **A Comprehensive HTML Website** (Includes everything: API, Theory, and Code). 2. 🧠 **A Mathematical Theory PDF** (Tailored for researchers and academic publications). 3. πŸ’» **An Architecture & Code PDF** (Tailored for software engineers and auditors). --- ## Prerequisites Before generating the documentation, ensure you have the following installed: 1. **Python 3.10+** 2. **A LaTeX Distribution (Required for PDFs):** * **Windows:** Download and install [MiKTeX for Windows](https://miktex.org/download). * *Important:* During MiKTeX installation, ensure the option **"Install missing packages on-the-fly"** is set to **"Yes"**. Sphinx will automatically attempt to download the LaTeX packages it needs during the build. * **Mac/Linux:** Install [MacTeX](https://tug.org/mactex/) (macOS) or [TeX Live](https://tug.org/texlive/) (Linux). --- ## Creating the Workspace To avoid conflicts with other Python projects, we will create an isolated virtual environment. Open a terminal (Command Prompt `cmd`, PowerShell, or your OS equivalent) at the root of the project and run: ```bash python -m venv .venv ``` *(Note: If `python` is not in your PATH, provide the full path to your Python executable, e.g., `C:/Users/USER/AppData/Local/Programs/Python/Python312/python.exe -m venv .venv`)* This creates a `.venv/` folder containing a clean Python installation. --- ## Activating the Environment Activate the environment to work inside it. **On Windows (CMD):** ```bat .venv\Scripts\activate.bat ``` **On Windows (PowerShell):** ```powershell .\.venv\Scripts\Activate.ps1 ``` **On Mac/Linux:** ```bash source .venv/bin/activate ``` > βœ… Once activated, you should see `(.venv)` displayed at the beginning of your command line prompt. --- ## Installing Dependencies We need to install **Sphinx** (the documentation generator), the **MyST Parser** (for Markdown support), **BibTeX** (for academic citations), and the TAM package itself (so Sphinx can read the Python docstrings). Run this command at the root of the project: ```bash pip install -r requirements.txt ``` *(Note: The repository's `requirements.txt` is already configured with `sphinx`, `myst-parser`, `sphinxcontrib-bibtex`, `sphinx-autodoc-typehints` and all necessary TAM dependencies.)* --- ## Building the Documentation Once the installation is complete, run the automation script located in your working directory: **On Windows:** ```bash .\build_docs.bat ``` **On Mac/Linux:** *(Assuming you have an equivalent shell script)* ```bash ./build_docs.sh ``` This script will automatically: 1. **Clean** previous builds to prevent caching issues. 2. **Scan** the Python source code (`src/tam`) to auto-generate the API reference. 3. **Build** the global HTML website. 4. **Build** the Mathematical Theory LaTeX files and compile them into a PDF. 5. **Build** the Architecture & Code LaTeX files and compile them into a PDF. --- ## Viewing the Results If the generation is successful, you will find your two distinct documentation formats in the `docs/build/` directory. You can open them directly: * HTML 🌐 **Interactive Website:** `docs/build/html/index.html` * PDF 🧠 **Theory (Researchers)** & πŸ’» **Architecture (Engineers):** `docs/build/latex_theory/TAM_documentation.pdf` --- ## 6. Troubleshooting PDF Generation (Corporate Environments) Building the interactive HTML website is entirely handled by Python and Sphinx. However, generating the academic and architectural PDFs involves a more complex pipeline relying on a LaTeX engine (like MiKTeX). In a corporate IT environment, strict network firewalls and restricted user privileges can disrupt this pipeline. To successfully generate the PDFs, your system requires three key elements: **1. System PATH Recognition** The terminal running the build script must be able to locate the LaTeX compiler. If you just installed your LaTeX distribution, you must completely restart your terminal or IDE (like VSCode) so it can recognize commands like `pdflatex`. **2. Dynamic Package Downloading ("On-the-fly")** To format the PDFs correctly, Sphinx relies on specific LaTeX styling packages (such as `cmap.sty`, `times`, `fncychap`, `tabulary`, etc.). Your LaTeX distribution must be configured to download and install missing packages automatically in the background during the build process. **3. Network Access and User Permissions** This is the most common bottleneck in enterprise environments. Automatic background downloads often fail because: * **Admin Rights:** Background installations might be blocked if you lack administrator privileges. You can bypass this by ensuring your LaTeX package manager (e.g., MiKTeX Console) is set to operate strictly in **"User mode"**, which installs packages locally in your `AppData` folder. * **Firewalls & Proxies:** Corporate firewalls may block connections to default LaTeX servers, resulting in "Timeout" errors. If this happens, you will need to open your LaTeX console, change the remote package repository to an alternative mirror (e.g., an HTTPS server in your country), or configure your company's proxy settings directly within the LaTeX console to allow manual package installations. --- ## πŸ—οΈ Quick Guide: How to Contribute To maintain crystalline clarity and respond to two very different audiences (researchers/mathematicians vs. software engineers), TAM uses a strict **Mirror Architecture** for its documentation. We divide the documentation into two sealed worlds. When you add a new feature, you must write a pair of files: 1. 🧠 **`math/` (The Brain):** Explains *why* the formula is exact. Contains theory, LaTeX equations, theorems, and academic citations. 2. πŸ’» **`architecture/` (The Hands):** Explains *how* the Python script calculates the formula without crashing. Contains PyTorch implementation details, OOP structure, VRAM management, and `{literalinclude}` code extraction. ### πŸ›οΈ The Golden Rules of Writing (Sphinx/MyST) To ensure the pipelines compile our PDFs and HTML flawlessly, all contributors must adhere to these strict writing rules: * **Separation of Code Comments vs. Markdown Theory:** Because the `architecture/` Markdown files dynamically pull source code via `{literalinclude}`, your Python docstrings (`r"""..."""`) and inline comments must focus *strictly* on software engineering (e.g., tensor shapes, VRAM allocation, OOM prevention, PyTorch workarounds). Do **not** write LaTeX mathematical proofs or academic citations inside the `.py` files. Let the `math/` Markdown files carry that burden. * **Zero Redundancy:** The `architecture/` files must *never* re-demonstrate the math. Instead, use clean relative links to point to the theory (e.g., `[See the theory](../../math/core/01_primal_model.md)`). * **Code Extraction:** **Do not** hard-copy and paste PyTorch code into Markdown files. Exclusively use the Sphinx `{literalinclude}` directive with exact relative paths (e.g., `../../../../src/tam/...`) and Python comment tags (`#: ` and `#: `) to pull code dynamically. * **Academic Citations:** Any bibliographic reference to justify scientific work must use the MyST formalism `{cite:p}\`bibtex_key\``. Ensure you add the corresponding entry to `references.bib` at the root of the project so it compiles in the PDFs. * **No Emojis in PDFs:** Ban emojis in titles and the body of text intended for the PDFs (`math/` and `architecture/` folders) to avoid fatal `pdflatex` compilation errors. Emojis are only permitted in the global `README.md`. ### πŸ“‚ Annotated Directory Structure (Mapping `.md` ↔ `.py`) ```text TAM/ β”‚ β”œβ”€β”€ README.md # Home (Auto-copied by Sphinx) β”œβ”€β”€ paper.md # JOSS Paper (Independent from Sphinx) β”‚ β”œβ”€β”€ src/tam/ # 🐍 SOURCE CODE β”‚ └─── # πŸ“š DOCUMENTATION (Sphinx) β”‚ β”œβ”€β”€ math/ # 🧠 THE "WHY" (Theory & Equations) β”‚ β”‚ β”‚ β”œβ”€β”€ core/ # -> Fundamental equations of the solver β”‚ β”‚ β”œβ”€β”€ 01_primal_model.md # Representer Theorem, Aronszajn. β”‚ β”‚ β”‚ # Scope scripts: _base.py, additive.py β”‚ β”‚ β”œβ”€β”€ 02_tensorization.md # N-Dim Broadcasting, temporal/group independence. β”‚ β”‚ β”‚ # Scope scripts: _data.py, _math.py β”‚ β”‚ β”œβ”€β”€ 03_linear_system.md # Linear algebra (Cholesky vs Conjugate Gradient). β”‚ β”‚ β”‚ # Scope scripts: _math.py, _dispatcher.py β”‚ β”‚ β”œβ”€β”€ 04_complexity.md # Proof of O(N DΒ²) vs O(NΒ³) complexity. β”‚ β”‚ β”‚ # Scope scripts: _math.py, _dispatcher.py β”‚ β”‚ └── 05_gcv_theory.md # Golub's trace, Tikhonov regularization. β”‚ β”‚ # Scope scripts: _dispatcher_gcv.py β”‚ β”‚ β”‚ β”œβ”€β”€ spectrum/ # -> Mathematical definition of Bases (Formulas for Ξ¦ and P) β”‚ β”‚ β”œβ”€β”€ LINEAR.md # Scope scripts: _linear.py β”‚ β”‚ β”œβ”€β”€ SPLINES.md # Scope scripts: _spline.py β”‚ β”‚ β”œβ”€β”€ FOURIER.md # Scope scripts: _fourier.py β”‚ β”‚ β”œβ”€β”€ WAVELETS.md # Scope scripts: _wavelet.py β”‚ β”‚ β”œβ”€β”€ NEURAL.md # Scope scripts: _neural.py β”‚ β”‚ β”œβ”€β”€ PHYSICS_PIKL.md # Scope scripts: _physics.py β”‚ β”‚ β”œβ”€β”€ RBF.md # Scope scripts: _rbf.py β”‚ β”‚ β”œβ”€β”€ CATEGORICAL.md # Scope scripts: _categorical.py β”‚ β”‚ β”œβ”€β”€ CHEBYSHEV.md # Scope scripts: _chebyshev.py β”‚ β”‚ β”œβ”€β”€ TREE.md # Scope scripts: _tree.py β”‚ β”‚ β”œβ”€β”€ LINEAR_TREE.md # Scope scripts: _linear_tree.py β”‚ β”‚ β”œβ”€β”€ PID.md # Scope scripts: _pid.py and model/bode.py β”‚ β”‚ └── CROSS_TENSOR.md # Scope scripts: _tensor.py β”‚ β”‚ β”‚ └── meta/ # -> Theory of Meta-Learning algorithms β”‚ β”œβ”€β”€ 01_adaptive_online.md # Sliding windows theory and concept drift. β”‚ β”‚ # Scope scripts: adaptive.py β”‚ β”œβ”€β”€ 02_kalman_filter.md # Extended Kalman Filtering, Woodbury matrix identity. β”‚ β”‚ # Scope scripts: kalman.py β”‚ β”œβ”€β”€ 03_hierarchical_joint.md # Joint optimization under constraints (Parent = Sum). β”‚ β”‚ # Scope scripts: hierarchical.py β”‚ β”œβ”€β”€ 04_conformal_safety.md # Conformal prediction (Split, ACI by Gibbs & CandΓ¨s). β”‚ β”‚ # Scope scripts: safety.py β”‚ β”œβ”€β”€ 05_opera_aggregation.md # Expert aggregation, regret bounds, Cesa-Bianchi. β”‚ β”‚ # Scope scripts: opera.py β”‚ β”œβ”€β”€ 06_deep_gam_backfitting.md # Orthogonal backfitting per group (Hybridization). β”‚ β”‚ # Scope scripts: neural.py (DeepGAM) β”‚ β”œβ”€β”€ 07_statistical_diagnostics.md # T-tests, Bootstrap, Degrees of freedom. β”‚ β”‚ # Scope scripts: diagnostics.py β”‚ β”œβ”€β”€ 08_auto_orchestrator.md # Evolutionary AutoML, EDA, Hub-and-Spoke, Parsimony. β”‚ β”‚ # Scope: auto_tam.py, drag_tam.py, knowledge_graph.py, population_nodes.py, evolution_reporter.py β”‚ β”‚ # Pipeline Scope: context.py, data_manager.py, base_discoverer.py, expert_expander.py, ensemble_selector.py β”‚ β”œβ”€β”€ 09_auto_data_topology.md # Data topology, Krylov stability, Covariate Lock, Panel Data bounds. β”‚ β”‚ # Scope: data_profiler.py, feature_engineer.py, effect_selector.py, parser.py β”‚ └── 10_mlops_evaluation.md # Theory of empirical metrics, SMAPE, and Temporal Degradation. β”‚ # Scope scripts: metrics.py, performance_analyzer.py β”‚ └── architecture/ # πŸ’» THE "HOW" (Code, PyTorch & API) β”‚ β”œβ”€β”€ core/ # -> Implementation of the core engine β”‚ β”œβ”€β”€ 01_additive_api.md # Main class and OOP construction. β”‚ β”‚ # Scope scripts: additive.py, _base.py, _factory.py, _base_effects.py, utils.py β”‚ β”œβ”€β”€ 02_data_pipeline.md # Normalization, Padding, transformations. β”‚ β”‚ # Scope scripts: _data.py β”‚ β”œβ”€β”€ 03_math_dispatcher.md # PyTorch routing (Direct Cholesky vs Sparse CG). β”‚ β”‚ # Scope scripts: _dispatcher.py, _math.py β”‚ β”œβ”€β”€ 04_hardware_memory.md # Anti-OOM systems, RAM/VRAM estimation. β”‚ β”‚ # Scope scripts: hardware.py, _memory.py, _dispatcher.py (catch OOM), _tree.py (sparse COO), utils.py β”‚ β”œβ”€β”€ 05_gcv_implementation.md # Discrete coordinate descent and block matrices. β”‚ β”‚ # Scope scripts: _dispatcher_gcv.py β”‚ └── 06_the_spectrum_api.md # Spectrum of core mathematic projection basis. β”‚ # Scope scripts contained in model/spectrum folder β”‚ └── meta/ # -> Implementation of Wrappers / Meta-Models β”œβ”€β”€ 01_adaptive_code.md # Vectorized sliding windows. β”‚ # Scope scripts: adaptive.py, _data.py (_transform_data_adaptive) β”œβ”€β”€ 02_kalman_torchscript.md # @torch.jit.script optimization and block updates. β”‚ # Scope scripts: kalman.py β”œβ”€β”€ 03_hierarchical_code.md # Creation of global sparse L^T L loss matrices. β”‚ # Scope scripts: hierarchical.py β”œβ”€β”€ 04_safety_code.md # Tensor calculation of quantiles and residual tracker. β”‚ # Scope scripts: safety.py β”œβ”€β”€ 05_opera_gpu.md # 3D Tensor Batching (Groups, Time, Experts) on GPU. β”‚ # Scope scripts: opera.py β”œβ”€β”€ 06_neural_hybrid.md # Integration of nn.Sequential in the backfitting loop. β”‚ # Scope scripts: neural.py (DeepGAM) β”œβ”€β”€ 07_diagnostics_utils.md # Effect plots, statistical tests, and Pandas formatting. β”‚ # Scope scripts: diagnostics.py, plotting.py, utils.py β”œβ”€β”€ 08_auto_orchestrator_code.md # 7-Step Pipeline, Knowledge Graph, Dynamic Annealing, OPERA. β”‚ # Scope: auto_tam.py, drag_tam.py, knowledge_graph.py, population_nodes.py, evolution_reporter.py, autotam_report_generator.py β”‚ # Pipeline Scope: context.py, data_manager.py, base_discoverer.py, expert_expander.py, ensemble_selector.py β”œβ”€β”€ 09_auto_data_topology_code.md # Stateful Bounds, Collinearity Filter, Regex Parser. β”‚ # Scope: data_profiler.py, feature_engineer.py, effect_selector.py, parser.py └── 10_mlops_tracking_code.md # BenchmarkTracker OOP, NaN-safe metrics, Matplotlib dashboards. # Scope scripts: tracker.py, metrics.py, plotting.py (evaluation) ``` ### Practical Example: Adding the Kalman Filter If you are tasked with adding documentation for the Extended Kalman Filter meta-learner, your contribution must be split exactly like this: **1. The Math File (`docs/source/math/meta/02_kalman_filter.md`)** * **Target Audience:** Researchers. * **Scope:** Focus entirely on the Markov equations and the Woodbury matrix identity. * **Requirements:** Use standard LaTeX blocks for formulas (`$$...$$`). Cite academic papers justifying the Extended Kalman Filter approach using `{cite:p}`. **2. The Architecture File (`docs/source/architecture/meta/02_kalman_torchscript.md`)** * **Target Audience:** Engineers. * **Scope:** Focus entirely on GPU optimization and block updates. * **Requirements:** Explain how the `@torch.jit.script` decorator is used to compile the inference loop into native C++ to avoid Python GIL bottlenecks. Use `{literalinclude}` to extract the specific decorated function from `src/tam/model/kalman.py` (which should only contain engineering-focused comments). Link back to the Math file for the theory.