Data Protection and Security at Scale

British Library, 3rd December 2015.

Main organisers: David Aspinall, Jonathan Cave, Richard Clayton, Andrew Martin, David Pym

Society has become utterly dependent on large, complex ecosystems that process vast amounts of data. Data protection regulation and technical security and privacy implementation measures are intended to safeguard people and organisations against misuse or misappropriation of data. But we currently have poor handles on how to deal with these at scale, and across multiple interconnected socio-technical systems. Machine learning and data-mining generates new information that may be highly sensitive even when run on single sources of “open” data. Multiple data sources in aggregation can be used to mount linkage or re-identification attacks, defeating anonymisation attempts. And failures to implement secure systems, meaningful authentication or proper encryption leave data open to traditional cyber attacks, either against systems, or against people by “social engineering” using leaked private information.
Data science itself offers techniques that can help: for example, machine learning is being used to detect malware that exfiltrates information. Differential privacy is a statistical method which imposes precise limits on the quantity of information revealed by database queries, provided a budget is imposed on the number and accuracy of the queries.
But the ultimate challenges of data security cannot be solved by mathematics and computer science alone; they require a multi-disciplinary viewpoint to also consider, for legal and regulatory frameworks, societal and governance mechanisms, economic perspectives to understand multi-stakeholder settings and conflicting incentives, and psychology and human factors considerations to explore meanings behind personal data and identity and (data) privacy. Thus, we want to consider four inter-related socio-technical issues in tandem:

1. Security: How can large ecosystems of interconnected systems be protected to acceptable levels at acceptable operational and economic cost, while maintaining acceptable levels of privacy and trust?

2. Privacy: What are appropriate principles for establishing/managing privacy (i.e., authorised but restricted data sharing) in systems handling big data?

3. Identity: How can identity be established in distributed, dynamic information systems? Can we exploit data-driven methods to enhance or replace more traditional notions of address, name, or UID?

4. Trust: How are trust relationships between systems expressed and ensured, and how do these propagate through data they process?
This workshop aims to bring together researchers, industry and public sector representatives to consider a range of problems posed by these issues.