Dependable Computing Systems Laboratory


Home	Projects	Publications	Presentations	People	News	Activities	About DCSL	Internal

Project Title: Privacy for Healthcare Applications

Hospitals are collaborative environments with resource constraints. Application of Computer Technology in such environments can help facilitate resource utilization and sharing to a large extent leading to better quality of service. In this project, we investigate the security and privacy aspects of electronic healthcare services—to protect the sensitive medical records and the infrastructure from malicious organizations or entities. We propose the use of a novel communication scheme, known as Content-based Publish Subscribe (CBPS) [1], for sharing of medical data among the caregivers (hospitals, testing laboratories, pharmacies, insurance agencies etc.) and interested third parties (researchers, market analysts, government agencies etc.). We observe that sharing of medical data can be of two types—a) sharing of *precise* data (among the caregivers), and b) sharing of *imprecise* data (data released to third parties for statistical analysis). In the rest of this report, we explain the motivation behind each of these sharing mechanisms, our security objectives, and proposed solutions. Background Content-Based Publish-Subscribe (CBPS) is an asynchronous communication paradigm where a message is routed based on its content instead of a fixed destination address. Typically, three types of nodes form the backbone of a CBPS network—*publishers, the entities that send a message into the network; subscribers, the entities that express their intention to receive messages with certain content; and brokers, the intermediate nodes that route messages from the publishers to the subscribers. Typically there are multiple levels of brokers between the publishers and the subscribers. CBPS has been shown to be an effective communication stratum under the following conditions—dynamic pairing between publishers and subscribers, fine-grained expression of interest by subscribers, and environments where a subset of publishers and subscribers are ephemeral. It has been shown that CBPS is capable of delivering messages with low latency and of scaling to a large number of publishers and subscribers [2]. The messages generated by publishers are termed as notifications. A notification consists of a collection of attributes and their values. Similarly, a subscriber expresses her interest with one or more filters—a logical expression on the attribute values. In the medical context, we consider patients’ medical records or hospital specific information to be the notifications and the receivers’ interest to receive certain medicalrecords to be the filters. E.g. a sample notification regarding bed availability in a hospital ward may look as {<wardName, "cardiology", string>, <wardId, 1234, integer>, <totalBeds, 20, integer>, <bedsAvailable, 12, integer>, <timeStamp, "01/01/2010 09:00am", datetime>} A subscriber for this notification can be a hospital administrator who wants to list all the wards that has more than 10 beds available with the filter (bedsAvailable > 10). In a typical CBPS network, the publishers do not know the subscribers and vice versa. The intermediate brokers store subscribers’ interests in a partially ordered list of filters known as filter posets. The notifications are routed from the publishers to the subscribers by progressive matching of filters at the brokers. The assumption here is that the brokers are completely trusted and they can read the messages and the filters. However, we note that placing complete trust on the brokers may be harmful for the security and privacy of patients’ medical records, e.g., the intermediate brokers can be “out of network” health exchanges that are not trusted or they may even be taken over by hackers. In such situations, it is necessary to encrypt the notifications so that it can only be read by legitimate subscribers. We also want to hide the filters from the brokers since filters can expose a patient’s medical condition, e.g., a patient subscribing for HIV related health advisories can be detected by the brokers and their identities released in public! Can the brokers match encrypted filters against encrypted notifications and perform routing?—this is the objective of our research in precise data sharing* model. Several researchers have tried to address this problem by using cryptographic techniques like computation on encrypted data [3], commutative encryption [4], or homomorphic encryption [5]. However, all of these approaches have their shortcomings—false positives or misrouting [3], lesser expressivity of subscriber interests or filters [4, 3, 5], high execution time [3, 5], and high message overhead [3, 5]. Our solution, titled v-CAPS, achieves the competing goals while avoiding these problems. In the imprecise data sharing model, we observe that researchers often do not require exact data values to perform data mining operations. It has been shown in privacy-preserving data mining literature that we can compute nearly accurate information from perturbed data values by negating the perturbation [6, 7, 8]. E.g., a researcher wants to perform data mining tasks (building decision trees, computing association rules etc.) on the patients’ blood sugar levels during the course of a treatment. To protect privacy of patients, we may add random values from a known distribution with the patients’ blood real sugar levels and release this perturbed data to the researcher. It is possible to extract nearly accurate information even from the perturbed data. Our objective in this research is to customize the CBPS network to enable perturbations at multiple levels so that different subscribers receive sensitive data with varying levels of accuracy according to their trustworthiness. We formulate each of these problem statements in the following section. Problem Statement Sharing of Precise Data: Our objective is to develop a routing protocol for CBPS networks that supports the following security requirements: Notification Confidentiality: No one except the publisher of a notification and its authorized subscribers can view the notification content. Subscription Confidentiality: No one except the subscriber and the publisher to whom it subscribes can know the content of a filter. Subscriber Anonymity: Brokers cannot know which publisher's message goes to which subscriber except the last broker along the path to a subscriber. Moreover, one subscriber does not know who are the other subscribers receiving the notification. Path Anonymity: During routing, a broker can learn about its immediate predecessors and its immediate successors. It cannot, however, determine which other brokers carry this notification in the broker network. Sharing of Imprecise Data: Consider X is a random variable representing original value of an attribute, Y is a random variable with a known p.d.f., and Z is the random variable representing the perturbed attribute value. Our objective is to develop a distributed multi-level randomization scheme for CBPS networks that achieves the following goals: Conformance of Cumulative Randomization: As a data value (x, an instance of X) passes through different brokers, each broker (bi) adds a suitable random value (yi) to the value it receives. The sum of these individual perturbations should be identical to a known perturbation distribution (y, an instance of Y). If y1, .., yk are the perturbations added to a data item x by the brokers along the path from the publisher (Pi) to the subscriber (Si), then, y = y1+ .. +yk, where k is the number of brokers along a path. The overall perturbation Y is determined by the privacy level requested by subscriber Si. Resistance to Collusion:If a subset of the subscribers and brokers (possibly all) colludes among themselves, then they cannot infer the original data value with higher probability than what can be inferred with the most trusted data value. If a set of colluding subscribers (brokers) receive the perturbed values z1, .., zm for a data value x, then Pr[x \| z1, .., zm] = Pr[x \| zbest], where zbest is the least perturbed data value in {z1, .., zm}. Solution Approach We make the important observation that routing in CBPS networks does not necessarily require inspection of the whole message. Instead, if a trusted publisher extracts the routing information from a message before encrypting it, then the problem reduces to hiding this information from malicious brokers. In our solution approach, the publisher looks at the commonality of interests among subscribers and encodes the routing information in the form of a routing vector (RV). The RV is added to the header of a message and allows brokers to compute their receiver lists. Further, the RV is encrypted (we use the term SRV to denote the encrypted RV) such that a broker cannot discern any information about which other brokers or subscribers will see the message. Our simple approach eliminates the need for complex cryptographic operations, thereby, making it possible to incorporate the full generality of filters in baseline CBPS systems, with minimal computational overhead on the brokers. The concessions that v-CAPS makes are added execution overhead at the publisher and some loss of decoupling between publishers and subscribers. However, the partial loss of decoupling has added advantage of auditability and enforcement of access control on subscriber interests. Here, we make the observation that the structure of a CBPS network allows us to propagate the privacy levels of subscribers bottom-up (from subscribers to the publishers). The propagation of privacy levels in the CBPS network essentially assigns a privacy level to each broker for a given attribute value. Since the sum of two independent Normal Distributions is also a Normal Distribution, each broker can now perturb data values according to a normal distribution with a given variance. The total perturbations shall have a variance which is the sum of the individual variances. The exact value of the variances can be computed from the desired privacy levels. Since each broker computes a new perturbed value from the data it receives from its higher level broker, collusion will be as effective as the accuracy of data at the topmost broker. We are currently exploring solutions to achieve uniform perturbation in such a distributed multi-level randomization scheme and to guarantee stronger assurances for protection against collusion. Current Students: Amiya Maji. References Eugster, P. T., Felber, P. A., Guerraoui, R., and Kermarrec, A. 2003. The many faces of publish/subscribe. ACM Comput. Surv. 35, 2 (Jun. 2003), pp. 114-131. Carzaniga, A., Rosenblum, D. S., and Wolf, A. L. 2001. Design and evaluation of a wide-area event notification service. ACM Trans. Comput. Syst. 19, 3 (Aug. 2001), pp. 332-383. Raiciu, C., and Rosenblum, D. S. Enabling confidentiality in content-based publish/subscribe infrastructures. In Securecomm and Workshops (2006), pp. 1-11, Aug. 28—Sep. 1, 2006, Baltimore, MD, USA. Shikfa, A., Onen, M., and Molva, R. Privacy-preserving content-based publish/subscribe networks. In 24th IFIP International Information Security Conference (2009), pp. 270–282, May 18—20, 2009, Pafos, Cyprus. Mohamed Nabeel, Ning Shang, E. B. Privacy-preserving filtering and covering in content-based publish subscribe systems. Tech. rep. 2009-15, Purdue University, June 2009. Agrawal, R. and Srikant, R. 2000. Privacy-preserving data mining. SIGMOD Rec. 29, 2 (Jun. 2000), pp. 439-450. Agrawal, D. and Aggarwal, C. C. 2001. On the design and quantification of privacy preserving data mining algorithms. In Proc. of the 20th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS '01), pp. 247-255, Santa Barbara, CA, USA. Xiao, X., Tao, Y., and Chen, M. 2009. Optimal random perturbation at multiple privacy levels. Proc. VLDB Endow. 2, 1 (Aug. 2009), pp. 814-825. Song, D. X., Wagner, D., and Perrig, A. Practical techniques for searches on encrypted data. In Proc. of the IEEE Symposium on Security and Privacy (2000), pp. 44-55. Carzaniga, A. Siena download. http://www.inf.usi.ch/carzaniga/siena/software/index.html

465 Northwestern Avenue, West Lafayette, IN 47907 | dcsl@ecn.purdue.edu | +1 765 494 3510