Privacy Preserving Communication

Overview

Summary

With the growing demand for adaptive and intelligent communication networks, content-based publish subscribe (CBPS) has gained significant attention in the research community over the last decade. Publish-subscribe in general is a communication technique whereby publishers “publish” messages and a set of subscribers receive these messages. This is more efficient than a publisher sending multiple point-to-point messages to each subscriber. Publish-subscribe offers a degree of decoupling between the publishers and the subscribers—a network of brokers together route the messages from a publisher to the correct set of subscribers. In CBPS systems, the subscriber defines a filter (a logical expression) on the content of a message, such as, a diabetic patient may be interested in availability of a drug named `Glucotrol’ where the store zip code is either `47906′ or `47907′ and unit price is less than `$3′. Only messages (termed Notifications) matching the filter will be delivered to our hypothetical patient. This scenario is depicted in the figure below. Here the brokers execute sophisticated algorithms for matching messages with constraints on attribute values in the filter (such as sub-string, equality, inequality).

figure1

However, CBPS systems rely heavily on the integrity of brokers. Researchers have shown that achieving message confidentiality, integrity, and auditability in the presence of malicious brokers is a challenging assignment. Consider the following scenario: Our hypothetical patient Jane is infected with HIV. She wants to subscribe for drug availability and preventive care newsletters for her disease from an online health information exchange that uses CBPS for content delivery. Due to the sensitivity of her disease, Jane doesn’t want the brokers to learn about her subscription. Similarly, Dr. Watson, a publisher in the health information exchange, doesn’t want to divulge contents of his messages to the brokers. But normal functioning of CBPS requires that the brokers should inspect both pieces of information (notifications and subscriptions) to route messages. In this project, we solve this paradoxical problem of computing routing decisions based on encrypted notification and subscriptions, denoted hereafter as the Secure Routing Problem in CBPS. A solution to the orthogonal problem, that of computation and distribution of group keys in CBPS may be found elsewhere [Usenix Security 2001, Prakash et al.].

We make the important observation that routing in content-based publish-subscribe networks does not necessarily require inspection of the whole message. Instead, if a trusted publisher extracts the routing information from a message before encrypting it, then the problem reduces to hiding this information from malicious brokers. A level of indirection in naming of filters combined with separation of duty transforms the paradoxical problem to a practical one. In our solution approach, the publisher looks at the commonality of interests among subscribers and encodes the routing information in the form of a routing vector. The routing vector (RV) is added to the header of a message and it allows brokers to compute their receiver lists. We present two versions of our protocol—one where the RV is left unencrypted (termed the RV protocol), and the second where the RV is further encrypted to achieve both confidentiality and anonymity (termed Secure RV or SRV protocol). The collection of these two protocols is termed v-CAPS: Confidentiality and Anonymity preserving routing protocol for content-based Publish-Subscribe networks using routing vectors. Our simple approach eliminates the need for complex cryptographic operations, thereby, making it possible to incorporate the full generality of filters in baseline CBPS systems, with low computational overhead on the brokers. Our experimental results show that RV performs nearly as fast as a baseline CBPS in terms of latency. Achieving perfect anonymity (which we do through the SRV protocol), however, is significantly more costly and practical only for medium-sized networks.

Achieved Technical Goals

We evaluated the performance of our protocols against baseline CBPS (Siena) with respect to end-to-end latency and computation time for notifications in a wide-area deployment. We implemented all the three protocols Baseline, RV, and SRV; and deployed them on PlanetLab, a worldwide computer systems testbed. Our experiments involve sending upto 100,000 subscription messages from the subscribers over a wide-area network, which, to the best of our knowledge, are the largest scale experiments on secure CBPS.

In our experiments, we created a hierarchical broker network with 4 levels (refer figure below). Each of the brokers in the top three levels were considered to have a fanout of 3, whereas, the leaf brokers were randomly connected to the subscribers. This constituted a broker network with 40 nodes. Each broker was hosted on a separate machine at Purdue University, 28 of which belonged to two clusters in our research group and the rest on public desktops in one of Purdue University laboratories. The subscribers were hosted on PlanetLab machines situated at widely varying geographical locations. We ran our experiments with upto 1000 subscribers hosted on 50 PlanetLab machines (i.e. 20 subscriber processes per machine). Each subscriber subscribed with 1-200 filters with a uniform random distribution, generating 100,000 subscriptions in our largest workload. In total, the experiments involved coordination between 1132 processes running on 91 machines over the Internet.

figure1

Our key observations from the experiments are presented below. All comparisons are done based on our largest workload, i.e. with 100,000 subscriptions, and hence are valid for smaller workloads.

  1. End-to-end latencies for baseline and RV are very similar. In baseline, end-to-end latency for all the nodes is within 5 ms of network RTT, whereas, in RV, this is within 10 ms of network RTT. End-to-end latency in SRV, is however significantly higher than network RTT. This happens due to large matching time at the brokers.
  2. Even with the largest workload, the difference in computational time between RV and baseline is only 3 ms. The computational cost for SRV is, however, 2 orders of magnitude higher than RV (600 ms in SRV as opposed to 6ms in RV for a moderately popular notification).
  3. SRV computation time at publisher for the largest workload is nearly 4 ms, which is insignificant compared to cost at brokers for SRV protocol. Therefore, our protocol does not impose much overhead on the publishers.

Based on our workload, message overhead per subscriber (i.e. extra bits added in header normalized by number of subscribers receiving a message) is only 4 Bytes. This shows the efficiency of our protocol in terms of network overhead.

fig2fig3

a) Baseline b) RV

fig4

c) SRV

It may be noted that in our current implementation, the leaf brokers are directly connected to subscribers and, therefore, have the potential to violate the anonymity requirements of the subscribers. To achieve subscriber anonymity from leaf brokers, we are currently implementing an anonymizing network between these entities. The nodes in this anonymizing network will have no notion of the filters but will act as proxies of the subscribers.

Students

  • Amiya Maji,  amaji AT purdue DOT edu
Last modified: March 18, 2015