Logical Foundations of Data Science

UCL, 16-17 November

Main organisers: Anuj Dawar, Leonid Libkin, Richard Mayr, Ranko Lazic, Boris Motik, Andrzej Murawski, Alexandra Silva

The relational database model that forms the basis of products from IBM, Oracle, Microsoft, and many others, is deeply rooted in logic (e.g., SQL is essentially a programming syntax for firstorder predicate calculus). Extensions like XML and graph data models are deeply rooted in logic as well. For instance, the language XPath is based on dynamic and temporal logics, such as PDL and CTL (originating in the field of verification of software and hardware). The key challenges of big data related to volume, incompleteness of sources, and variety of data models, require a new technical toolkit from logicians: efficient evaluation of formulae/queries, new modelling techniques for uncertainty, and new logical languages for different data models. Current problems include the scalability of declarative queries (Volume); modelling incompleteness that occur in different types of data (Veracity); handling new emerging models like graph data (Variety) and their speed of change (Velocity). Thus the workshop will focus on logical foundations of the V’s of big data.

1. Approximation and uncertainty.

For big data, we shift from data-based approximations to static ones, which approximate queries without looking at the data. A new toolset of techniques need to be developed, in order to provide some guarantees for query answers (Veracity) while staying computationally feasible.

2. Graph data.

Data have been increasingly converted into the XML format for data exchange.Also, we are witnessing an ever increasing demand for graph structured data, particularly in emerging applications such as social networks (e.g., Facebook, LinkedIn, Twitter) and Semantic Web (e.g., RDF).

3. Automata models.

Automata models serve as a technical bridge between logic and complexity theory, and constitute a standard tool with which to attack expressivity and evaluation issues in query languages (Volume). For emerging applications in data science, the rich theory of finite state systems needs to be broadened, e.g., by automata over infinite alphabets for XML data processing (Variety). Moreover, there are links between querying algorithms in complex databases and classic open problems in theoretical computer science, such as the complexity of reachability in vector addition systems.