Ph.D. Research Proposal Exam: Zhusheng Wang
Tuesday, August 16, 2022
301 405 3681
ANNOUNCEMENT: Ph.D. Research Proposal Exam
Name: Zhusheng Wang
Professor Sennur Ulukus (Chair)
Professor Alexander Barg
Professor Adrian Papamarcou
Date/time: August 16th, 2022 at 11:00 am
Location: (Zoom link) https://umd.zoom.us/j/93560154065?pwd=NWdvQmM4bmM1c2N0V1VFRFgvdkFpZz09
Title: Towards Secure Computation and Learning from Symmetric Private Information Retrieval
We consider the problems of secure computation and learning from the perspective of information theoretic symmetric private information retrieval (SPIR). Private information retrieval (PIR) is an elementary and classical problem in computer science, aiming to protect the privacy of users. In a typical PIR setting, a user downloads one message out of a set of messages from multiple non-colluding and replicated databases in such a way that no database can know which message the user has downloaded. This constraint is known as the user privacy constraint. In SPIR, the privacy is symmetric as the privacy of users and databases need to be guaranteed simultaneously. That means that, not only the databases cannot know which message the user has downloaded, the user itself cannot learn anything further than the particular message it has downloaded. This constraint is known as the database privacy constraint. Recently, PIR and SPIR problems have attracted much attention in the information theory community. In this proposal, we focus on the SPIR problem as it is a distributed (multi-database) version of $1$-out-of-$K$ oblivious transfer (OT), which has important applications for security and privacy in distributed systems. We study fundamental properties and novel extensions of SPIR, and explore its use in secure computation and learning in distributed settings, to provide information theoretic security and privacy guarantees.
In our first completed work, we investigate the problem of two-party private set intersection (PSI) from the perspective of information theory. PSI is a two-party secure computation problem, where two parties determine their common elements without leaking any further information about their remaining elements to each other. We first consider the multi-message SPIR (MM-SPIR) problem, which is an extension of the single-message SPIR (SM-SPIR) problem, where the user retrieves multiple messages at a time from the databases. We obtain the capacity of the MM-SPIR problem as a stand-alone result. We show that the MM-SPIR capacity is equal to the SM-SPIR capacity. We also unify the achievability schemes of multi-message PIR (MM-PIR) and MM-SPIR in this process by proposing a new capacity-achieving MM-SPIR scheme. Finally, we establish the equivalence between PSI and MM-SPIR with i.i.d.~messages of length $1$ bit through an incidence vector mapping. Consequently, we derive the optimal (minimum) download cost for the PSI problem.
In our second completed work, we consider the problem of multi-party private set intersection (MP-PSI). In MP-SPI, several parties wish to jointly determine the intersection of their respective elements while protecting the information about the remaining elements. The MP-PSI is a non-trivial extension of the two-party PSI as it cannot be implemented via multiple use of two-party PSI. As the main result of this work, we propose a new information theoretic MP-PSI scheme that builds on the connection between the PSI problem and the MM-SPIR problem. Our scheme is a non-trivial generalization of the two-party PSI scheme as it needs an intricate design of the shared common randomness. We also find that our scheme does not incur any penalty due to the more stringent privacy constraints in the MP-PSI problem compared to the two-party PSI problem in terms of the download cost.
In our third completed work, we investigate the problem of SPIR with user-side common randomness where the user is provided with a random subset of the shared database common randomness, which is unknown to the databases. We determine the exact capacity region of the triple $(d, \rho_S, \rho_U)$, where $d$ is the download cost, $\rho_S$ is the amount of shared database (server) common randomness, and $\rho_U$ is the amount of available user-side common randomness. We show that with a suitable amount of $\rho_U$, this new variant of SPIR achieves the capacity of the conventional PIR. As a corollary, single-database SPIR becomes feasible. As a consequence, SPIR with user-side common randomness can be applied to secure computation problems in the first two completed works, in which each party now may possess only a single database.
Next, we propose three works for future research. In our first proposed work, we consider the total (upload plus download) communication cost of two-database SPIR through its relationships to conditional disclosure of secrets (CDS) and conditional disclosure of multiple secrets (CDMS). In CDS, two parties each holding an individual input and sharing a common secret wish to disclose this secret to an external party in an efficient manner if and only if their inputs satisfy a public deterministic function. As a natural extension of CDS, in CDMS, two parties share multiple i.i.d.~common secrets. Inspired by the observed equivalence between a special configuration of CDMS and the two-database SPIR, we design download cost efficient SPIR schemes given a specific upload cost by using bipartite graph representation of CDS and CDMS. We expect to find (or get close to) the optimal communication cost of two-database SPIR following this idea. In our second proposed work, we consider the problem of random SPIR (RSPIR) where the user does not pick a specific message to download, instead is content with any one of the messages stored in the databases. This is the digital version of a blind box, also known as gachapon, which implements the above specified setting with physical objects for entertainment. This is also the blind version of $1$-out-of-$K$ OT, an important cryptographic primitive. We aim to derive the information theoretic capacity of RSPIR for the two-database case. In our third proposed work, we consider the federated submodel learning (FSL) problem where a number of clients jointly update the full model by downloading only the needed submodels and uploading the needed submodels' increment while keeping clients' desired submodel indices and local training data private from the server. This involves a private set union (PSU) problem as an intermediate step. In PSU, multiple parties wish to jointly compute the union of their elements without revealing anything else about the remaining elements. We aim to combine FL and PSU in the same framework, to develop information theoretically private FSL schemes.