Monday, December 13, 2010

Talk: MCMC and databases

Here are the slides of a keynote talk I gave in Spetember at the Scalable Uncertainty Management Conference (SUM 2010) in Toulouse. In this talk I sketched research challenges in creating probabilistic database management systems based on Markov Chain Monte Carlo, describing my own as well as other recent work on the topic. The abstract reads as follows:

Several currently ongoing research efforts aim to combine Markov Chain Monte Carlo (MCMC) with database management systems. The goal is to scale up the management of uncertain data in contexts where only MCMC is known to be applicable or where the range and flexibility of MCMC provides a compelling proposition for powerful and interesting systems. This talk surveys recent work in this area and identifies open research challenges.
The talk starts with a discussion of applications that call for the combination of MCMC with ideas from database management. This is followed by a brief discussion of the now somewhat maturing field of probabilistic databases not based on MCMC, and what can be learned from these. Next, the architecture of an MCMC-based database management system is sketched, and key technical and algorithmic challenges are discussed. For efficient MCMC, it is key to be able to quickly evaluate queries on a sequence of many sample databases among which consecutive samples differ only moderately. The talk discusses techniques for efficiently solving this problem by aggressive incremental query evaluation. The locality of changes between consecutive samples is also key to scaling MCMC beyond state sizes that fit conveniently into a computer’s main memory.
The second part of the talk addresses query languages beyond industry-standard languages such as SQL, which have limited appeal in the context of the scientific applications of MCMC. Computational problems to which MCMC is applied are often best expressed in terms of iteration and fixpoints. Database research knows languages centered around these principles, and it is interesting to understand how iteration as a query language construct interacts with MCMC sampling. The talk presents recent results in this space, including considerations of complexity and expressive power of query languages specifically designed for MCMC.

Please cite as

Christoph Koch: Markov Chain Monte Carlo and Databases. In Proc. SUM 2010, p.1.

0 comments:

Post a Comment