Event
Ph.D. Research Proposal Exam: Sajani Pallegoda Vithana
Wednesday, August 3, 2022
10:00 a.m.
Virtual (Zoom)
Emily Irwin
301 405 0680
eirwin@umd.edu
ANNOUNCEMENT: Ph.D. Research Proposal Exam
Name: Sajani Pallegoda Vithana
Committee:
Professor Sennur Ulukus (Chair)
Professor Behtash Babadi
Professor Adrian Papamarcou
Date/time: August 3, 2022 at 10:00 am
Location: (Zoom link) https://umd.zoom.us/j/8761960954
Title: Private Information Read Update Write with Applications to Distributed Learning
Abstract:
We consider the problem of private read update write (PRUW), in which a user downloads, updates and writes back a specific section of a storage system while guaranteeing information-theoretic privacy of the index of the section updated and the values of the updates. PRUW has applications in distributed learning such as privacy preserving federated learning (FL) with sparsification and private federated submodel learning (FSL). In FSL, a machine learning model is divided into multiple submodels based on the types of data used for training, and a given user at a given time instance only downloads and updates the submodel relevant to the user’s local data. FSL requires additional privacy guarantees since the index of the submodel updated by a given user as well as the values of the updates leak information about the user’s local data to the storage system. PRUW is used in FSL, where the required submodel is downloaded in the reading phase without revealing the submodel index to the databases, and the updates are sent back to the databases in the writing phase without revealing the values of the updates or the submodel index. The reading phase of PRUW is identical to the problem of private information retrieval (PIR), and the writing phase is the immediate conceptual extension of PIR, known as private information transmission (PIT).
In our first completed work, we investigate the problem of semantic PIR, where we consider messages (files) with arbitrary semantics such as arbitrary message lengths and popularity profiles. As the main result of this work, we characterize the capacity of semantic PIR, with two achievable schemes and a converse proof. We provide capacity results and achievable schemes for semantic PIR with replicated databases, MDS coded databases and colluded databases. In the first part of our second completed work, we improve one of the existing PRUW schemes on private FSL, by introducing a new writing scheme and a storage mechanism that significantly reduces the writing cost. This scheme achieves the lowest known total communication cost of PRUW thus far. In this work, we introduce the concept of combining multiple parameter updates into a single bit in a specific way such that it can be privately decomposed into the respective individual updates and added to the relevant positions at the databases. In the second part of our second completed work, we consider the problem of PRUW in relation to FSL with 'top r' sparsification. Top r sparsification is a widely used sparsification technique in learning applications, where only the most significant r fraction of parameters is updated in order to reduce the communication cost. However, the positions and values of these sparse updates leak information about the user’s local data. Therefore, in this work, we provide a PRUW scheme that performs top r sparsification in FSL while guaranteeing information-theoretic privacy of the updating submodel index, values of the sparse updates and the positions of the sparse updates. This scheme achieves significantly reduced reading and writing costs compared to what is achieved in PRUW without sparsification. In the third part of our second completed work, we provide a scheme that performs PRUW in FSL with random sparsification. In random sparsification, a randomly chosen subset of parameters is updated, as opposed to updating the entire submodel, in order to reduce the communication cost. We formulate the problem in terms of a rate-distortion characterization, where we derive the minimum achievable communication cost for a given amount of allowed distortion. We propose a PRUW scheme that performs random sparsification in FSL, while satisfying the same privacy guarantees mentioned above for top r sparsification, and show that the rate-distortion characterization is linear.
Next, we propose three problems as future directions. The first proposed work extends the problem of private FSL with top r sparsification to privacy preserving FL. The PRUW scheme for top r sparsification in FSL (second part of second completed work) achieves lower communication costs at the expense of an increased storage cost. Decreasing the storage cost in this scheme results in a certain amount of information leakage. Therefore, in this work, we aim to find the optimum tradeoff between the communication cost, storage cost and information leakage in PRUW with top r sparsification in FL. In our second proposed work, we address the problem of PRUW with storage constrained databases. In this work, we consider two cases, namely, homogeneous databases, where all databases have the same storage capacity and heterogeneous databases, in which different databases have different storage limitations. In this work, we aim to find optimum PRUW schemes and storage mechanisms that result in the minimum communication cost, for a given set of storage constraints. In our third proposed work, we consider a multi-user setting for PRUW in FL with sparsification, in which the underlying communication model consists of multiple access channels (MAC), instead of dedicated communication links to each user-database pair. Our intuition is that the use of MACs can improve the efficiency of communication in the perspective of the databases, since the databases often send the same data to multiple users, and it is only the sum of all user updates of each parameter that needs to be stored in the databases. In this work, we develop PRUW schemes and characterize the minimum achievable communication costs in the perspective of both users and databases for the multi-user setting with MACs, while satisfying information-theoretic privacy of the values and positions of the sparse updates.
