Research work


Challenges

MOSAICrOWN will enable data sharing and collaborative analytics in multi-owner scenarios, providing owners with control on the information shared and released to others, efficient and scalable techniques for enforcing data protection and supporting privacy-preserving analytics. In particular, we consider three different dimensions that help in defining the challenges to be tackled.

  • Requirements capturing and representation. We consider the step from high level specifications to policies regulating access, sharing, usage, and processing of data
  • Enforcing technologies. We consider the development of mechanisms providing technical protection to the data, by enforcing data protection measures. Along this dimension, we distinguish complementary approaches: data wrapping, providing protection by (partially or completely) disabling visibility of data while preserving some functionality, and data sanitisation operating on data collections to provide an obfuscated (e.g., not precise) version of the data or non sensitive aggregates over them
  • Enforcement phase. We distinguish three phases in the data life-cycle in the data market where data protection measures can be enforced, namely: ingestion, when data is passed from the owners to the data market; storage, when data is stored at, and managed by, the data market provider; and analytics, when data is processed in the data market. The latter two phases concern protection of data outside direct owner’s control

The result of MOSAICrOWN will be a set of modular tools providing for an enriched data market scenario and protection to the data across the whole life-cycle.

Methodology

We tackle the issues and challenges mentioned above with a gradual approach, addressing first policy specification and management, and then developing enabling technologies providing data wrapping and data sanitisation techniques. Each of these challenges is devoted a specific work package, in which the problem is investigated considering the whole data life-cycle (i.e., enforcement phase).

Policy specifications – Data governance framework (WP3)
MOSAICrOWN aims to provide a data governance framework for managing data and for specifying policies in multi-owner collaborative scenarios. MOSAICrOWN will first identify all relevant requirements and protection needs. The step from requirements to actual specifications, understandable for data owners, with an expressive and comprehensive approach, requires to capture the different concepts that need to be expressed providing a metadata model for referencing data. As data owners need to regulate the use, sharing and processing of their data, MOSAICrOWN will devise a formal model and a declarative policy language that also non-specialists can use for specifying, in a flexible way, different protection regulations that may need to be imposed on data. The model should be based on solid foundations to understand the effect of policy specifications and to reason on actual protection guarantees. The language should support restrictions on the whole data processing life-cycle and should be designed to be compatible with existing technology, thus resulting deployable in real systems. As data collections of different owners may also need to be combined or processed together to conduct analysis, MOSAICrOWN will also investigate solutions for policy management.

Data wrapping (WP4)
MOSAICrOWN aims to define efficient techniques to wrap data with a protection layer (typically removable), guaranteeing access functionality while, at the same time, not compromising on protection. Data wrapping techniques need to support different kinds of functionality as they need to be used in all the phases of the data life-cycle: in data ingestion by data owners, to move self-protected data to the market while enabling fine-grained data retrieval; in storage by the data market provider, before releasing data to external third parties for enabling their elaboration while satisfying the protection policies; in data analytics by the data market provider, to combine different data sources and produce a result that satisfies the policies of all the data owners. The design of data wrapping techniques is complicated by the need of efficiency and scalability of computations operating over wrapped data, which are required by both data owners and by parties accessing data in the data market. MOSAICrOWN will also consider economic incentives, which can be given to data owners for the use of data, and economic benefits that can derive from the use of less expensive Cloud Infrastructures.

Data sanitisation (WP5)
MOSAICrOWN aims to design efficient and scalable enforcing techniques that work on whole data collections to provide an obfuscated and/or aggregated version, robust against possible re-identification, linkage, and correlation attacks. The distributed and multi-owner nature of the considered scenario makes the design of such techniques a difficult task, which requires the consideration of several challenges. First, the sanitisation techniques must protect data while preserving their utility for the expected computations. Second, as data will be needed for analysis and computations, the work will also investigate efficient sanitisation techniques able to protect a given data collection in full respect of the policy associated with such a collection as well as guaranteeing the needed level of privacy and utility. To reach this goal, we plan to design techniques able to preserve specific characteristics of data while ensuring anonymisation. Third, when data analysis involves different data collections possibly under the control of different data owners, the problem arises of supporting computations over data sanitised in different ways, with different granularity, and subject to different usage and sharing restrictions.