Towards Automatic Synthesis of Statistical Data Analysis Programs

Bernd Fischer
Automated Software Engineering Group USRA/RIACS
NASA Ames Research Center, California, USA

Automatic program synthesis is a formal approach to software development, in which efficiently executable programs are automatically derived from high-level specifications. It has successfully been applied to a number of domains, for example, celestial mechanics, transportation scheduling, or option pricing. In this talk I will discuss its application to machine learning, or more precisely, to statistical data analysis, and I will present the AutoBayes system currently under development at NASA Ames.

AutoBayes takes a specification in form of a statistical model, extracts a graphical model (i.e., Bayesian network) from it, and then derives code by a process called schema-based synthesis. Schemas are generic algorithms with their applicability conditions. Schemas come in different ''flavors''; some are derived from decomposition theorems for graphical models, others implement generic machine-learning algorithms like EM. Schemas are applied recusively until irreducible subproblems occur which are then solved by the application of symbolic or numeric solvers. AutoBayes has been applied to a number of textbook and application problems, including clustering (using EM), changepoint detection, and software reliability estimation.

In the talk, I will discuss some examples and their derivation processes in more detail and demonstrate the system ''live''.

AutoBayes is joint work with W. Buntine, J. Schumann, and J. Whittle.


