Events & Distinguished Lectures
Assessing privacy in open Web-based scenarios
Personal information is being widely dispersed in the Web. Social networks in particular have become a focal point for collecting data from billions of people. It is impossible, today, for regular users to understand the privacy implications entailed, like the spreading of private posts within their network, or the possible effects of linking information across multiple sites. Helping users assess their privacy situation in the reality of today’s Internet is a major challenge that involves information extraction, modeling, and prediction in a highly distributed environment, with multimodal contents, at massive scale, and under highly incomplete information.
Privacy in mobile computing
Sophisticated mobile computing, sensing and recording devices, in particular smartphones, have become our daily companions, thus blurring the distinction between the online and offline worlds. While these devices, and emerging ones like Google Glasses, enable transformative new applications and services, they also introduce entirely new threats to users’ privacy. A deluge of smartphone apps request users to grant access to their highly sensitive personal data and privacy-critical functionality, in particular offering the possibility to build up a complete record of the user’s location, online and offline activities, and social encounters, including an audiovisual record. Helping users understand the privacy-relevant behavior of third-party software, to protect users’ privacy when interacting with software and other mobile device users, and to ultimately aid app developers to enforce privacy by design constitutes a formidable challenge. It in particular includes software analysis and enforcement on third-party software, secure programming principles for privacy by design, and privacy-friendly solutions for a wide range of communications and interactions amongst mobile device users.
Anonymity in online interactions
Users disseminate personal information not only actively, through postings in social networks etc., but also passively as a side effect of their online interactions, with Web service providers or other communication partners. This source of privacy loss is more subtle, yet no less threatening: a Web service provider typically learns who accesses a service for what purposes; communication technologies typically do not conceal from an observer who is communicating with whom, and sometimes do not even protect the content of the conversation. Information collectors tacitly observe user interactions, for advertising and other purposes. Preserving anonymity in such interactions, yet without inhibiting their functionality, involves challenges such as the development of novel cryptographic solutions to resolve the tension between anonymity and functionality on the algorithmic level, as well as complementary solutions on the network level to ensure anonymous yet reliable communication and interaction.
The research questions addressed by the collaborative research center can be broadly classified into those relevant to understanding privacy, those relevant to controlling privacy, and the overarching ones relevant to both:
- Due to the pervasiveness of modern devices that process personal information, along with the increasingly visual user content, finding users’ privacy-critical traces has become a highly sophisticated task. How to identify and extract the privacy-relevant information from users’ modern digital habitats, at Internet-scale? How can we represent the results of such extraction in reproducible data models which personalized privacy assessments can be grounded on?
- Traditional research on expressing and quantifying user privacy pertains to structured data and typically to closed systems, allowing global data sanitization. To cope with users’ modern digital habitats, we instead need to deal with unstructured, highly heterogeneous, and highly incomplete data, without global sanitization. How to analyze aspects of privacy in such settings? How to accurately reflect cross-dependencies between the multitude of personal data sources? How to assess privacy threats in a way that is meaningful to individual users?
- Ultimately, it is desirable to identify privacy-critical user actions a priori, i.e., before they are executed with possibly detrimental consequences. Users should be able to understand the privacy consequences that their next action would have. How to predict privacy threats in modern digital habitats, modeling and analyzing the future exposure of privacy-critical information? How to design what-if analysis techniques, assessing the implications of hypothetical events?
- Designing software securely is a difficult and error-prone task, particularly if strong privacy guarantees should be achieved. It is widely accepted nowadays that protocol designers and software developers should be offered technical guidance to design systems with in-built security and privacy from the very start, but research still falls short of comprehensively achieving this for privacy-critical applications. How to develop concepts that assist developers to design privacy-friendly applications? How to ensure that carefully derived application designs are not subsequently broken because of implementation choices and flaws?
- Controlling privacy becomes particularly important if the user relies on third-party software or services that are not considered trusted. In such settings, users should be given protective technology that prevents personal information from being captured and misused. How to enforce privacy in the presence of untrusted components as comprehensively as possible? Which properties can be enforced under which circumstances, and where are the limits of technical privacy enforcement?
- Applications offering functionality personalized to individual users typically have an inherent tension with the privacy guarantees they manage to offer. This pertains in particular to core and emerging technologies that exchange information in pervasive scenarios, such as digital advertisement, trustworthy anonymous communication, ubiquitous mobile devices. How to develop privacy-friendly solutions that resolve the tension between functionality and privacy in such technologies? What are the technical limitations and functionality/privacy trade-offs?
- Privacy assessments and their consequences for a user’s privacy need to be conveyed to the user in a comprehensible manner. Privacy protections and their associated privacy/functionality trade-offs need to be conveyed to the user in a comprehensible manner. How to understandably present privacy assessments, and protection options, to laymen users?
- Rigorous notions of what user privacy means are required for privacy threat analysis, to formulate user privacy policies, in privacy-aware software and system design, for privacy enforcement in untrusted components. Which privacy notions, and which associated computer-processable privacy property representations, are suitable? Where and how do the requirements on these notions and representations differ, and to what extent can they be coordinated?
- Understanding vs. controlling privacy employ fundamentally different approaches, top-down vs. bottom-up; the latter is mainstream in privacy research, while the former is under-explored. How to transfer and align models across these approaches, such as data models, privacy notions, information flow? How to integrate techniques, e.g. combining threat assessment/prediction with enforcement for more efficient protection? To what extent can we lift control methods to the open world, e.g. relying on trusted hardware?
- In designing analytic as well as protective technology, we need to trade off between precision and computational cost. On the one hand, we need to be as precise as possible, delivering useful information and not interfering more than absolutely necessary. On the other hand, the situation analysis respectively the decision to interfere must be made in real-time, while the user is taking actions and privacy-critical software is running. How can we obtain the best precision with limited resources, e.g., on a mobile device? What part of the analysis can be prepared offline, what needs to be done online? What are the sweet spots in different scenarios?