To secure today’s computer systems, it is critical to have different intrusion detection sensors embedded in them. The complexity of today's distributed computer systems makes it difficult to determine the appropriate configuration of these detectors, i.e., which ones to use, where to place them, and how to set up their rule or event definitions to achieve the best detection for the distributed system. In the choice of the number of detectors, more is not always better. There are several reasons why an extreme design choice of a detector at every possible network point, host, and application may not be ideal - the economic cost of acquiring, configuring, and maintaining the detectors; keeping the number of alert streams under attack as well as benign conditions bounded; limit the performance penalty due to the sensors; a system owner may have specific security goals, e.g., detecting a security goal may be very important and requires high sensitivity, while another may need to be done with less tolerance for false positives.
The problem that we address in this work is, given the security goals in a system and a model for the way multi-stage attacks (MSAs) can spread in the system, how can we automatically and, based on scientific principles, select the right configuration of detectors. “Right” is determined by an application-specific requirement on the true positive (TP) - true negative (TN) rate of detection in the system. We explore the space of the configuration of the individual detectors, their placement on the different hosts or network points, and their number in an efficient manner. We use Bayesian inference on a Bayesian network created out of the attack graphs to solve the problem of determining the likelihood that an attack goal has been achieved, given a certain set of detector alerts.
Further, various sources of dynamism are to be expected in large-scale protected systems deployed in enterprise settings. The underlying protected system itself changes with time, with the addition or deletion of hosts, ports, software applications, or changes in connectivity between hosts. A static solution is likely to miss new attacks possible in the changed configuration of the protected system as well as throw off false alarms for attack steps that are just not possible under the changed configuration. The nature of attacks may also change with time or the anticipated frequencies of attack paths may turn out to be not completely accurate based on attack traces observed at runtime. Therefore, our proposed solution needs to update its “belief” in an efficient manner. Finally, when the compromise of a critical asset appears imminent, fast reconfiguration of existing sensors (such as, turning on some rules) may be needed to increase the certainty about the security state of the critical asset. Our contribution in this work is to show how the choice and placement of sensors can be updated through incremental processing when the above kinds of dynamism occur.
We propose a distributed intrusion detection framework that includes two components: (1) a probabilistic reasoning engine and (2) a network of detection sensors to detect various stages of MSAs. The second component comprises oï¬?-the-shelf sensors, augmented with a standard wrapper that allows the sensor to send alerts to the reasoning engine and receive commands back from the reasoning engine. The architecture is able to alert intrusion events related to potential MSAs and determine if any critical asset has been compromised, or is under imminent likelihood of being compromised based on current evidence of the spread of the attack. It also allows for reconfiguration of sensors according to changes to the protected system that is being monitored by the DIADS. Through this architecture, the DIADS can reduce the number of false positives that it would report if it were independently considering each step of the MSA. The reasoning engine represents different possible MSAs as a single Bayesian network, which is updated according to events reported by the detection sensors and the changing network configuration. The probabilistic engine can also request more information from sensors when necessary. The reasoning engine can estimate the security state of the critical assets given partial information about MSAs and from imperfect or noisy sensors. A block diagram of the proposed architecture is shown below.
Our testbeds for this work come from two sources - an NSF-funded center called NEEScomm and Northrop Grumman’s internal network (through a joint Northrop Grumman-CERIAS initiative). The NEES network features 14 geographically-distributed, shared-use laboratories that support several types of experimental earthquake-related work. Purdue provides the IT infrastructure for these laboratories through the NEEScomm center. The infrastructure, based on the HUBzero technology, provides telepresence capabilities, a publicly accessible curated data repository, web-based collaborative tools, access to leading edge compute resources, and opens source computational tools. Physically, this infrastructure comprises about 20 physical servers, running in total about 100 virtual machines. Below is the Bayesian network representation of the NEES network, used in our experiments. Each colored box represents a host, labeled according to its role (for example: web server, database server, developer’s desktop). Inside each box, each node represents a security vulnerability found in the host and labeled by its CVE identifier number.