Events & Distinguished Lectures
Project Group A: Understanding Privacy
Projects in this group focus on different sources of personal data, and address different technology areas required as building blocks to comprehensively understand user privacy.
We will develop methods for identifying privacy-relevant information in digital user habitats, in particular from natural language utterances in online forums, and from visual data.
We will develop methods for predicting how information may be disseminated, through (unintended or malicious) software leakage, and through information spreading in online social networks.
We will analyze how natural language utterances and visual data may enable an adversary to link a user’s online accounts across sites, and we will build on all these methods to predict how an adversary may compromise a user’s privacy by combining all these sources of information.
We will start the investigation of suitable user interaction paradigms, understandably presenting the outcome of particular threat analyses, giving the user better control over her privacy.
Naturally, the projects will concentrate on adapting and advancing state-of-the-art methods to meet the challenges imposed by the highly heterogeneous, unstructured, and dynamic nature of the domain.
We need to extract information from large-scale heterogeneous data sources; to advance image and software analysis methods; to pioneer the linkage of unstructured user records consisting mostly of natural language text; to advance methods enabling convenient privacy specification by laymen users.
Many of our methods will devise computer-processable models – of user data, of a user’s current exposure to the extent that can be determined, of inferences an adversary could make, of hypothetical near-term future events – and advance associated analysis and simulation techniques for assessing and predicting privacy threats.
Jens Dittrich & Gerhard Weikum
The goal of this project is to devise technology for comprehensively retrieving and assessing a user’s long-term privacy-critical traces, across her entire digital habitat and the Internet at large.
Today, this is not possible even for the data contained within individual platforms (such as Facebook).
Meeting the grand challenge entails several sub-goals: How to find and retrieve user data from the Internet’s heterogeneous search space (social networks, forums, review sites, Deep Web services, public databases, etc.), and how to continuously monitor that data, in a scalable manner?
How to determine a data item’s criticality to user privacy?
How to determine its visibility, as well as its provenance, i.e., how it became visible (e.g., by actions of other users)?
To address these goals, we will harness and enhance entity search, focused crawling, Deep Web querying techniques, and combinations thereof.
We will use statistical methods combined with background knowledge to identify privacy-critical information, such as emotional or embarrassing statements.
As cues to reconstruct data provenance, we will employ copy detection and co-reference analysis in natural language statements.
Mario Fritz & Bernt Schiele
Users today typically share and disseminate massive amounts of visual data (images, videos), via Webpages, social networks, and personal communication.
While it is obvious that visual data may contain privacy relevant information and thus (eventually) constitute a threat, there are no systematic methods for assessing that threat.
The long-term goal of this project is to provide users with comprehensive and accurate tools for doing just that:
Can a user’s activities be extracted from her posted media content?
Can a user be re-identified across separate postings?
To what extent can an entire user diary and social relations be reconstructed?
To design tools capable of answering these questions, we will advance state-of-the-art computer vision techniques to deal with the highly heterogeneous and uncontrolled nature of regular users’ visual data.
We will investigate their combination with the context information from social networks, and examine the effect of common protection techniques such as face blurring.
We will devise probabilistic models for analyzing the inferences that can be made from a public set of images, and predicting the inferences that could be made (what-if) in case additional images were added.
Lionel Briand & Andreas Zeller
Third-party applications, on mobile phones and in Web services, often share private data liberally, in ways the user may be unaware of, and malicious malware communicating critical data to an attacker is a rising problem.
This project aims to provide tools that automatically analyze privacy leakage from existing software, putting users in a position to understand how their data is being shared (and, ultimately, put a stop to undesired data sharing).
As a key concept towards this end, we introduce privacy patterns, summarizing how applications may access, process, and propagate sensitive data, in a form amenable to further analyses, and communicable to users.
Privacy patterns identify sources and sinks of sensitive data, obtained by abstracting over multiple executions, and multiple concrete sources and sinks.
Abstracting over multiple related apps will allow us to characterize “normal” behavior and consequently, to detect “abnormal” behavior which users should be alerted to.
To tackle third-party, multi-language, binary, distributed, obfuscated, and even adverse software and components like malware, we will couple static and dynamic analysis with novel test generation techniques that are robust and scalable, yet target the flows and patterns of interest.
A key feature of online social networking sites like Twitter and Facebook is the possibility to re-post, i.e., forward, information posted by another user.
Transitively, such information spreading can easily bring information to recipients it was not originally intended to, and can even set off large-scale social contagions or cascades, when a post goes “viral”.
Social network users today have no support for understanding their potential exposure through such spreading.
While models of spreading processes have been widely considered in the literature, these (a) focus on information of public interest (chain letters, marketing), as opposed to private interest; and (b) do not take into
account the ranking mechanisms organizing users’ feeds and thus controlling what posts they are actually exposed to and are likely to re-post.
Our project aims at addressing these deficiencies.
To that end, we will devise probabilistic information-diffusion models interleaving data reproduction with data exposure.
We will design private information tracking methodology to obtain training data for these models, and tailor state-of-the-art algorithms to analyze and predict information spreads.
Michael Backes & Gerhard Weikum
One of the biggest threats in modern digital habitats is identity linkability across disparate platforms (e.g., identifying a particular user in an anonymous health discussion forum).
The long-term goal of this project is to devise a deep semantic understanding of, and tool support for the analysis of, such linkability.
How can we characterize user profile linkability risks?
How can we automatically analyze the risk at stake for a particular user, given the heterogeneous and unstructured nature of typical user platform content?
Given such an understanding, how can we exploit it to best protect the user without affecting functionality?
We will address these challenges through devising a user-centric privacy measure assessing a user’s ability to hide amongst her peers.
We will employ text understanding technology, and connect to visual data analysis methods, to gain a technological grasp on linkability through user content, and we will investigate sources of linkability through network traffic.
We will employ this understanding to inform users about linkability risks, and for guiding protection mechanisms against linking, in particular by automated re-phrasing and targeted (and hence cost-effective) anonymization.
It is well known that, in current digital user habitats, users experience severe difficulties in (a) understanding and (b) specifying their privacy settings.
Deep privacy threat analysis methods such as proposed above provide the essential building block for (a), but how to understandably communicate their results to laymen users?
And how to (b) enable users to understandably configure and fine-tune their settings? This project’s core idea is to tackle both through the use of simulated examples, hypothetical privacy-critical events: concrete ways (a particular kind of recipient obtaining a particular kind of information) in which the user’s privacy may become compromised, critically according to an explicit user model.
In-situ interactions based on wearable computing technology allow for convenient feedback from real-world events, fine-tuning the user model and policies.
In the long-term, this general idea could in principle apply to many kinds of privacy-relevant scenarios, threats, and simulation methods.
Within the scope of the first funding period, we will focus on (explicit i.e. user-controlled) location sharing in online social networks, simulating information spreading.