Explaining the science
Despite advances in machine learning and probabilistic methods and tools, the majority of analysts working in government and industry use spreadsheets (such as Excel) for their day-to-day work and hence do not have these advances readily available. Most analysts, broadly construed, do not use machine learning ideas.
The reasons for the widespread use of spreadsheets are clear. Models are built for one-off tasks but become established models; they are a convenient vehicle for communication; and they often form the glue between operational systems. Often a spreadsheet is the only tool available to an analyst working in a particular domain.
In addition, there are significant, well-known problems that arise when spreadsheets are one’s only tool. Two of the most powerful tools in the programmer’s repertoire, abstraction and modularity, are very difficult to implement in spreadsheets, resulting in repetitive, ‘low-level’ code. Lack of version control makes it difficult to trace back the changes to a model. Re-use of parts of existing spreadsheets in new models is essentially impossible and updates to one's models are error-prone and time-consuming.
In addition, and critically for this project, use of traditional spreadsheets means that the analyst cannot easily compute with uncertainty, and is locked out of some of the most promising advances in probabilistic reasoning.
This project's long term aim is to develop a simple, high-level programming language for creating the sort of models that are typically built in a spreadsheet in industry and government. The language will be extended with the capability for probabilistic modelling and inference. Models built using this language will then be ‘compiled’ to a well-structured spreadsheet, complete with formatting.
The hope is to bring the advantages of software re-use, version control, abstraction, and machine learning to real-world modelling.