Thursday, June 9, 2011

ERC Starting Grant

It seems I won one. Wat do?




Thank you, Advice Dog!

Friday, June 3, 2011

Probabilistic databases book published online

Our Morgan Claypool Synthesis Lecture on probabilistic databases

Probabilistic Databases
Dan Suciu, Dan Olteanu, Christopher RĂ© and Christoph Koch
Synthesis Lectures on Data Management, May 2011, Vol. 3, No. 2, Pages 1-180
(doi: 10.2200/S00362ED1V01Y201105DTM016)

is now avaiable here!

Wednesday, December 15, 2010

DATA lab website online

The DATA lab now has a website: data.epfl.ch.

Monday, December 13, 2010

New paper: "DBToaster: Agile Views in a Dynamic Data Management System"

We just submitted the final version of our CIDR 2011 paper "DBToaster: Agile Views in a Dynamic Data Management System". This is the first paper to present the overall vision and goals of the DBToaster Project and recommended reading for those who want to learn about the project but only want to read one paper.

Get the pdf here.

Abstract. This paper calls for a new breed of lightweight systems – dynamic data management systems (DDMS). In a nutshell, a DDMS manages large dynamic data structures with agile, frequently fresh views, and provides a facility for monitoring these views and triggering application-level events. We motivate DDMS with applications in large-scale data analytics, database monitoring, and high-frequency algorithmic trading. We compare DDMS to more traditional data management systems architectures. We present the DBToaster project, which is an ongoing effort to develop a prototype DDMS system. We describe its architecture design, techniques for high-frequency incremental view maintenance, storage, scaling up by parallelization, and the various key challenges to overcome to make DDMS a reality.

Please cite as

Oliver Kennedy, Yanif Ahmad, and Christoph Koch. DBToaster: "Agile Views in a Dynamic Data Management System". Proc. 5th Biennial Conference on Innovative Data Systems Research (CIDR ’11), January 9-12, 2011, Asilomar, California, USA.

Talk: MCMC and databases

Here are the slides of a keynote talk I gave in Spetember at the Scalable Uncertainty Management Conference (SUM 2010) in Toulouse. In this talk I sketched research challenges in creating probabilistic database management systems based on Markov Chain Monte Carlo, describing my own as well as other recent work on the topic. The abstract reads as follows:

Several currently ongoing research efforts aim to combine Markov Chain Monte Carlo (MCMC) with database management systems. The goal is to scale up the management of uncertain data in contexts where only MCMC is known to be applicable or where the range and flexibility of MCMC provides a compelling proposition for powerful and interesting systems. This talk surveys recent work in this area and identifies open research challenges.
The talk starts with a discussion of applications that call for the combination of MCMC with ideas from database management. This is followed by a brief discussion of the now somewhat maturing field of probabilistic databases not based on MCMC, and what can be learned from these. Next, the architecture of an MCMC-based database management system is sketched, and key technical and algorithmic challenges are discussed. For efficient MCMC, it is key to be able to quickly evaluate queries on a sequence of many sample databases among which consecutive samples differ only moderately. The talk discusses techniques for efficiently solving this problem by aggressive incremental query evaluation. The locality of changes between consecutive samples is also key to scaling MCMC beyond state sizes that fit conveniently into a computer’s main memory.
The second part of the talk addresses query languages beyond industry-standard languages such as SQL, which have limited appeal in the context of the scientific applications of MCMC. Computational problems to which MCMC is applied are often best expressed in terms of iteration and fixpoints. Database research knows languages centered around these principles, and it is interesting to understand how iteration as a query language construct interacts with MCMC sampling. The talk presents recent results in this space, including considerations of complexity and expressive power of query languages specifically designed for MCMC.

Please cite as

Christoph Koch: Markov Chain Monte Carlo and Databases. In Proc. SUM 2010, p.1.

Wednesday, December 1, 2010

Talk: Rethinking the foundations of databases

Contemporary database query languages and systems are ultimately founded on logic. Last week I gave a talk at EPFL (in the KTN seminar) arguing for an effort to rethink databases and query languages and to found them on the machinery of modern abstract algebra. This bears the promise of radically simplifying and offering new angles of attack on some of the hardest and most central problems in data management -- such as query equivalence testing, the view update problem, and data integration. It also heralds the convergence of database systems and computer algebra systems, bearing the promise of a new breed of systems that could revolutionize large-scale data analysis and scientific data management and computing.

The video of this talk is now online here, and here are the slides. Unfortunately, the final few minutes of the talk are missing from the video.

Sunday, November 21, 2010

Pictures of DATA Lab trip to Zermatt on Flickr

The members of the amazing DATA Lab (before it got that name) went to Zermatt in great weather and HERE are our pictures.

DS1_8811