References¶

[1]

Simon Wood. Package ‘mgcv’ v 1.9-4 - mixed gam computation vehicle with automatic smoothness estimation. 2025.

[2]

Arthur E Hoerl and Robert W Kennard. Ridge regression: biased estimation for nonorthogonal problems. Technometrics, 12(1):55–67, 1970.

[3]

George S Kimeldorf and Grace Wahba. A correspondence between bayesian estimation on stochastic processes and smoothing by splines. The Annals of Mathematical Statistics, 41(2):495–502, 1970.

[4]

Claire Boyer, Antonin Chambolle, Yohann De Castro, Vincent Duval, Frédéric De Gournay, and Pierre Weiss. On representer theorems and convex regularization. SIAM Journal on Optimization, 29(2):1260–1281, 2019.

[5]

Corinna Cortes and Vladimir Vapnik. Support-vector networks. Machine learning, 20(3):273–297, 1995.

[6]

Vladimir Vapnik. The nature of statistical learning theory. Springer science & business media, 2013.

[7]

Francis Bach. Learning theory from first principles. MIT press, 2024.

[8]

Nathan Doumèche, Francis Bach, Éloi Bedek, Gérard Biau, Claire Boyer, and Yannig Goude. Forecasting time series with constraints. arXiv preprint arXiv:2502.10485, 2025.

[9]

Andrey Nikolayevich Tikhonov and others. On the stability of inverse problems. Dokl. Akad. Nauk SSSR, 39(5):195–198, 1943.

[10]

Bernhard Schölkopf, Ralf Herbrich, and Alex J Smola. A generalized representer theorem. In International conference on computational learning theory, 416–426. Springer, 2001.

[11]

Nachman Aronszajn. Theory of reproducing kernels. Transactions of the American mathematical society, 68(3):337–404, 1950.

[12]

Gene H Golub and Charles F Van Loan. Matrix computations. The Johns Hopkins University Press, 1996.

[13]

Gene H Golub, Michael Heath, and Grace Wahba. Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics, 21(2):215–223, 1979.

[14]

Stephen J Wright. Coordinate descent algorithms. Mathematical programming, 151(1):3–34, 2015.

[15]

Theodore J. Rivlin. The Chebyshev Polynomials. John Wiley & Sons, 2nd edition, 1990.

[16]

Yann LeCun, Léon Bottou, Genevieve B Orr, and Klaus-Robert Müller. Efficient backprop. In Neural Networks: Tricks of the Trade, pages 9–50. Springer, 1998.

[17]

Ali Rahimi and Benjamin Recht. Random features for large-scale kernel machines. In J. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems, volume 20. Curran Associates, Inc., 2007. URL: https://proceedings.neurips.cc/paper_files/paper/2007/file/013a006f03dbc5392effeb8f18fda755-Paper.pdf.

[18]

Puwasala Gamakumara, Edgar Santos-Fernandez, Priyanga Dilini Talagala, Rob J. Hyndman, Kerrie Mengersen, and Catherine Leigh. Conditional normalization in time series analysis. arXiv preprint arXiv:2305.12651, 2023.

[19]

Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Science & Business Media, 2009.

[20]

Lingfei Wu, Ian EH Yen, Jie Chen, and Rui Yan. Revisiting random binning features: fast convergence and strong parallelizability. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1265–1274. 2016.

[21]

Alex J Smola and Bernhard Schölkopf. A tutorial on support vector regression. Statistics and computing, 14(3):199–222, 2004.

[22]

Young-Ju Kim and Chong Gu. Smoothing spline gaussian regression: more scalable computation via efficient approximation. Journal of The Royal Statistical Society Series B: Statistical Methodology, 66(2):337–356, 2004.

[23]

Simon N Wood. Stable and efficient multiple smoothing parameter estimation for generalized additive models. Journal of the American Statistical Association, 99(467):673–686, 2004.

[24]

Carl Edward Rasmussen and Christopher KI Williams. Gaussian processes for machine learning. Volume 2. MIT press Cambridge, MA, 2006.

[25]

Trevor J Hastie. Generalized additive models. Statistical models in S, pages 249–307, 2017.

[26]

Grace Wahba. Spline models for observational data. SIAM, 1990.

[27]

Paul HC Eilers and Brian D Marx. Flexible smoothing with b-splines and penalties. Statistical science, 11(2):89–121, 1996.

[28]

Simon N Wood. Generalized additive models: an introduction with R. chapman and hall/CRC, 2017.

[29]

Carl De Boor. A practical guide to splines. Volume 27. springer New York, 1978.

[30]

Andrew C Harvey. Forecasting, structural time series models and the Kalman filter. Cambridge university press, 1989.

[31]

Bernhard Schölkopf and Alexander J Smola. Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT press, 2002.

[32]

John P Boyd. Chebyshev and Fourier spectral methods. Courier Corporation, 2001.

[33]

Umberto Amato, Anestis Antoniadis, Italia De Feis, Yannig Goude, and Audrey Lagache. Forecasting high resolution electricity demand data with additive models including smooth and jagged components. International Journal of Forecasting, 37(1):171–185, 2021.

[34]

Christopher Torrence and Gilbert P Compo. A practical guide to wavelet analysis. Bulletin of the American Meteorological society, 79(1):61–78, 1998.

[35]

Stéphane Mallat. A wavelet tour of signal processing. Elsevier, 1999.

[36]

Norman Ricker. The form and laws of propagation of seismic wavelets. In World Petroleum Congress, WPC–4039. WPC, 1951.

[37]

David L Donoho and Iain M Johnstone. Ideal spatial adaptation by wavelet shrinkage. biometrika, 81(3):425–455, 1994.

[38]

Lloyd N Trefethen. Approximation theory and approximation practice, extended edition. SIAM, 2019.

[39]

Nicholas J Higham. Accuracy and stability of numerical algorithms. SIAM, 2002.

[40]

Leo Breiman, Jerome Friedman, Richard A Olshen, and Charles J Stone. CART Classification and regression trees. Chapman and Hall/CRC, 2017.

[41]

Liudmila Prokhorenkova, Gleb Gusev, Aleksandr Vorobev, Anna Veronika Dorogush, and Andrey Gulin. Catboost: unbiased boosting with categorical features. Advances in neural information processing systems, 2018.

[42]

Erwan Scornet. Random forests and kernel methods. IEEE Transactions on Information Theory, 62(3):1485–1500, 2016.

[43]

Gérard Biau. Analysis of a random forests model. The Journal of Machine Learning Research, 13:1063–1095, 2012.

[44]

Nicolas Le Roux and Yoshua Bengio. Continuous neural networks. In Artificial Intelligence and Statistics, 404–411. PMLR, 2007.

[45]

Amit Daniely, Roy Frostig, and Yoram Singer. Toward deeper understanding of neural networks: the power of initialization and a dual view on expressivity. In D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc., 2016. URL: https://proceedings.neurips.cc/paper_files/paper/2016/file/abea47ba24142ed16b7d8fbf2c740e0d-Paper.pdf.

[46]

Jaehoon Lee, Yasaman Bahri, Roman Novak, Samuel S Schoenholz, Jeffrey Pennington, and Jascha Sohl-Dickstein. Deep neural networks as gaussian processes. arXiv preprint arXiv:1711.00165, 2017.

[47]

Arthur Jacot, Franck Gabriel, and Clément Hongler. Neural tangent kernel: convergence and generalization in neural networks. Advances in neural information processing systems, 2018.

[48]

Rishabh Agarwal, Levi Melnick, Nicholas Frosst, Xuezhou Zhang, Ben Lengerich, Rich Caruana, and Geoffrey E Hinton. Neural additive models: interpretable machine learning with neural nets. Advances in neural information processing systems, 34:4699–4711, 2021.

[49]

Boris N. Oreshkin, Dmitri Carpov, Nicolas Chapados, and Yoshua Bengio. N-beats: neural basis expansion analysis for interpretable time series forecasting. In International Conference on Learning Representations. 2020.

[50]

Yoh-Han Pao, Gwang-Hoon Park, and Dejan J Sobajic. Learning and generalization characteristics of the random vector functional-link net. Neurocomputing, 6(2):163–180, 1994.

[51]

Guang-Bin Huang, Qin-Yu Zhu, and Chee-Kheong Siew. Extreme learning machine: theory and applications. Neurocomputing, 70(1-3):489–501, 2006.

[52]

Peter Guttorp and Tilmann Gneiting. Studies in the history of probability and statistics xlix on the matérn correlation family. Biometrika, 93(4):989–995, 2006.

[53]

Michael L Stein. The screening effect in kriging. The Annals of Statistics, 30(1):298–323, 2002.

[54]

Jan Gertheiss and Gerhard Tutz. Sparse modeling of categorial explanatory variables. 2010.

[55]

John M Chambers and Trevor J Hastie. Statistical models. In Statistical models in S, pages 13–44. Routledge, 2017.

[56]

Brian D Marx and Paul HC Eilers. Multidimensional penalized signal regression. Technometrics, 47(1):13–22, 2005.

[57]

Simon N Wood. Low-rank scale-invariant tensor product smooths for generalized additive mixed models. Biometrics, 62(4):1025–1036, 2006.

[58]

Thomas Hofmann, Bernhard Schölkopf, and Alexander J Smola. Kernel methods in machine learning. 2008.

[59]

Zebin Yang, Aijun Zhang, and Agus Sudjianto. Gami-net: an explainable neural network based on generalized additive models with structured interactions. Pattern Recognition, 120:108192, 2021.

[60]

Maziar Raissi, Paris Perdikaris, and George E Karniadakis. Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational physics, 378:686–707, 2019.

[61]

Nathan Doumèche, Francis Bach, Gérard Biau, and Claire Boyer. Physics-informed kernel learning. Journal of Machine Learning Research, 26(124):1–39, 2025.

[62]

Robert Schaback and Holger Wendland. Kernel techniques: from machine learning to meshless methods. Acta numerica, 15:543–639, 2006.

[63]

George EP Box, Gwilym M Jenkins, Gregory C Reinsel, and Greta M Ljung. Time series analysis: forecasting and control. John Wiley & Sons, 2015.

[64]

Nicolas Minorsky. Directional stability of automatically steered bodies. Journal of the American Society for Naval Engineers, 34(2):280–309, 1922.

[65]

Karl Johan Åström and Richard Murray. Feedback systems: an introduction for scientists and engineers. Princeton university press, 2021.

[66]

Hendrik W Bode. Network analysis and feedback amplifier design. 1945.

[67]

Trevor Hastie and Robert Tibshirani. Varying-coefficient models. Journal of the Royal Statistical Society Series B: Statistical Methodology, 55(4):757–779, 1993.

[68]

Achim Zeileis, Torsten Hothorn, and Kurt Hornik. Model-based recursive partitioning. Journal of Computational and Graphical Statistics, 17(2):492–514, 2008.

[69]

Pierre Gaillard, Yannig Goude, Laurent Plagne, Thibaut Dubois, and Benoit Thieurmel. Opera, a r package, online prediction by expert aggregation (v1.1). 2016.

[70]

Nicolo Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games. Cambridge university press, 2006.

[71]

Rob J Hyndman and Anne B Koehler. Another look at measures of forecast accuracy. International journal of forecasting, 22(4):679–688, 2006.

[72]

Tianfeng Chai and Roland R Draxler. Root mean square error (rmse) or mean absolute error (mae)?–arguments against avoiding rmse in the literature. Geoscientific model development, 7(3):1247–1250, 2014.

[73]

James Durbin and Geoffrey S Watson. Testing for serial correlation in least squares regression: i. Biometrika, 37(3/4):409–428, 1950.

[74]

Manuel Arellano. Panel data econometrics. OUP Oxford, 2003.

[75]

Keshav Das, Julie Keisler, Margaux Brégère, and Amaury Durand. Automl algorithms for online generalized additive model selection: application to electricity demand forecasting. arXiv preprint arXiv:2503.24019, 2025.

[76]

Yann Allioux, Nathan Doumeche, and Eloi Bedek. Tam: time series additive model (v1.2.3). 2026. doi:10.5281/zenodo.20543272.

[77]

Yann Allioux, Athir Hamadieh, Louis Viennot, Ismaïl El Azzaoui, Antoine Gourbilleau, and Joseph de Vilmarest. viking_kalman: variational bayesian variance tracking in python, a fast implementation of the r viking package (v0.1.0). 2026. doi:10.5281/zenodo.20171756.

[78]

Joseph De Vilmarest and Yannig Goude. State-space models for online post-covid electricity load forecasting competition. IEEE Open Access Journal of Power and Energy, 9:192–201, 2022.

[79]

Joseph de Vilmarest and Olivier Wintenberger. Viking: variational bayesian variance tracking. Statistical Inference for Stochastic Processes, 27(3):839–860, 2024.

[80]

Nathan Doumèche. Weakl: weak kernel learner (v0.0.6). 2026.

[81]

Nathan Doumèche, Yannig Goude, Stefania Rubrichi, and Yann Allioux. Human spatial dynamics for electricity demand forecasting. IEEE Transactions on Power Systems, 2025.

[82]

Nathan Doumèche, Yannig Goude, and Yann Allioux. Human spatial dynamics for electricity demand forecasting: the case of france during the 2022 energy crisis. October 2023. URL: https://doi.org/10.5281/zenodo.10041368, doi:10.5281/zenodo.10041368.

[83]

Mostafa Farrokhabadi, Jethro Browell, Yi Wang, Stephen Makonin, Wencong Su, and Hamidreza Zareipour. Day-ahead electricity demand forecasting competition: post-covid paradigm. IEEE Open Access Journal of Power and Energy, 9:185–191, 2022.

[84]

Pierre Gaillard, Yannig Goude, and Raphaël Nedellec. Additive models and robust aggregation for gefcom2014 probabilistic electric load and electricity price forecasting. International Journal of forecasting, 32(3):1038–1050, 2016.

[85]

R. E. Kalman. A new approach to linear filtering and prediction problems. Journal of Basic Engineering, 82(1):35–45, 03 1960. URL: https://doi.org/10.1115/1.3662552, arXiv:https://asmedigitalcollection.asme.org/fluidsengineering/article-pdf/82/1/35/5518977/35_1.pdf, doi:10.1115/1.3662552.

[86]

Jingang Qu, David Holzmüller, Gaël Varoquaux, and Marine Le Morvan. Tabicl: a tabular foundation model for in-context learning on large data. arXiv preprint arXiv:2502.05564, 2025.

[87]

Skipper Seabold, Josef Perktold, and others. Statsmodels: econometric and statistical modeling with python. scipy, 7(1):92–96, 2010.

[88]

Shanika L Wickramasuriya, George Athanasopoulos, and Rob J Hyndman. Optimal forecast reconciliation for hierarchical and grouped time series through trace minimization. Journal of the American Statistical Association, 114(526):804–819, 2019.

[89]

Simon N Wood, Zheyuan Li, Gavin Shaddick, and Nicole H Augustin. Generalized additive models for gigadata: modeling the uk black smoke network daily data. Journal of the American Statistical Association, 112(519):1199–1210, 2017.

[90]

Joseph De Vilmarest and Olivier Wintenberger. Stochastic online optimization using kalman recursion. Journal of Machine Learning Research, 22(223):1–55, 2021.

[91]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, and others. Pytorch: an imperative style, high-performance deep learning library. Advances in neural information processing systems, 2019.

[92]

daniel servén, Charlie Brummitt, Franz Király, JodesL, Denise Schmitz, Zachariah Carmichael, Alejandro Ouslan, Fräntz Miccoli, IlyesD, Jonathan Taylor, Joseph Egan, Luc Lapenta, Lucas Servén Marín, Shishin Mo, Shushank Ranjan, Stephen Anthony Rose, Thomas Kraxner, Umberto Fasci, Viktor Szépe, Yasser El Haddar, hlink, Hassan Abedi, and nicoleta-kyo. Dswah/pygam: v0.12.0. December 2025. URL: https://doi.org/10.5281/zenodo.17981758, doi:10.5281/zenodo.17981758.

[93]

Amandine Pierrot and Yannig Goude. Short-term electricity load forecasting with generalized additive models. Proceedings of ISAP power, 2011.

[94]

James Durbin and Siem Jan Koopman. Time series analysis by state space methods. Oxford University Press (UK), 2012.

[95]

Max A Woodbury. Inverting modified matrices. Department of Statistics, Princeton University, 1950. Mathematical foundation of the Woodbury matrix identity (Sherman-Morrison-Woodbury formula), justifying the low-rank update inversion optimization.

[96]

Harold V Henderson and Shayle R Searle. On deriving the inverse of a sum of matrices. SIAM review, 23(1):53–60, 1981. Modern review and formal proof of the Woodbury identity used to stabilize the Kalman gain matrix inversion.

[97]

William W Hager. Updating the inverse of a matrix. SIAM review, 31(2):221–239, 1989.

[98]

Ali H Sayed. Adaptive filters. John Wiley & Sons, 2011.

[99]

Margaux Brégère and Malo Huard. Online hierarchical forecasting for power consumption data. International Journal of Forecasting, 38(1):339–351, 2022. Key reference for online hierarchical forecasting in the energy sector. Illustrates the post-hoc reconciliation approach, in contrast to the joint learning proposed here.

[100]

Guillaume Principato, Gilles Stoltz, Yvenn Amara-Ouali, Yannig Goude, Bachir Hamrouche, and Jean-Michel Poggi. Conformal prediction for hierarchical data. arXiv preprint arXiv:2411.13479, 2024. Demonstrates that naive application of conformal prediction level-by-level breaks hierarchical coherence and proposes interval reconciliation algorithms.

[101]

Anastasios N Angelopoulos and Stephen Bates. Conformal prediction: a gentle introduction. Foundations and Trends in Machine Learning, 16(4):494–591, 2023. Standard pedagogical reference for the "Split Conformal" method implemented in the base version of SafetyTAM.

[102]

Guillaume Principato and Gilles Stoltz. Blackwell's approachability for sequential conformal inference. arXiv preprint arXiv:2510.15824, 2025. Addresses the issue of exchangeability violation in time series (data drift) and proposes dynamic coverage rate adjustment methods.

[103]

Isaac Gibbs and Emmanuel Candes. Adaptive conformal inference under distribution shift. Advances in Neural Information Processing Systems, 34:1660–1672, 2021. Introduces the ACI (Adaptive Conformal Inference) algorithm implemented in SafetyTAM for online risk level adjustment.

[104]

Christopher M Bishop and Nasser M Nasrabadi. Pattern recognition and machine learning. Volume 4. Springer, 2006. Justifies the Bayesian interpretation of Ridge regression and uncertainty calculation via the inverse Hessian.

[105]

Dimitris N Politis and Joseph P Romano. The stationary bootstrap. Journal of the American Statistical association, 89(428):1303–1313, 1994. Theoretical foundation of Block Bootstrap for time series. Justifies contiguous block resampling to preserve residual autocorrelation.