3 Main Focus Areas
Distributed systems and applications are becoming increasingly pervasive in today’s world providing the core infrastructures for the largest commercial and scientific applications. The complexity and scale of these applications increase continuously as they span a larger number of software components, parallel tasks, and computing nodes. For example, large-scale applications running in today’s data centers and supercomputers span thousands of computing nodes with multiple cores per node. With this increasing trend in complexity and scale, it also becomes increasingly difficult to detect errors, performance anomalies, and unexpected behavior in these applications.
There has been significant work in understanding vulnerabilities in large-scale distributed cyber physical systems (CPS) and putting technological patches to address individual vulnerabilities, or classes of vulnerabilities. Technologists' efforts at addressing the vulnerabilities are often frustrated by the lack of understanding of the impact of any perturbation to the overall system. Arising from this understanding would be what are the elements of the system that need to be strengthened to limit the effects of the perturbation. Due to the large legacy nature of many CPS infrastructures and budgetary constraints, a complete re-architecting and wholesale strengthening of the system is often not possible; rather, rational decisions have to be made to strengthen parts of the system, including the connection points where multiple entities interact. To aid in such decision making, models of the CPS must be built that not only model the technological elements (the computing elements and the physical elements, in which there has been the most amount of prior work), but also the economic factors (who are the stakeholders and what are their economic drivers) and the policy factors (what controls can each stakeholder implement and how can they collaborate) that will guide the operational controls. These models, when instantiated with parameters from the real system, should enable rational and distributed decision making among the multiple stakeholders about which assets should protect, to what extent, and using what level of cooperation. Then, at runtime, based on inputs from sensors, the system should be able to determine if a perturbation is currently underway and if so, what is the optimal response action to put in place. The existing corpus of work today does not provide for such a pipeline of actions that is needed to secure a wide variety of CPS domains against non-deterministic perturbations.
This thrust seeks to establish the foundations for the realization of resilient, complex systems through collective innovation, which is characterized by the self-organization of individuals into decentralized, non-hierarchical communities. The research objective will be achieved by using principles from biological evolution and network dynamics to understand the bottom-up evolution of systems and the self-organization of communities. The research approach combines mathematical theories and computational approaches, including the theory of network evolution, social network analysis, and agent-based modeling. Knowledge pertaining to the evolutionary dynamics of systems and communities gained from this research, is being used to develop cyberinfrastructure for collective innovation.