Foundations of Comparative Analytics for Uncertainty in Graphs

Project Investigators:
Lise Getoor, University of Maryland
Alex Pang , University of California, Santa Cruz
Lisa Singh, Georgetown University

Project Overview
In today's linked world, graphs and networks abound. There are communication networks, social networks, financial transaction networks, gene regulatory networks, disease transmission networks, ecological food networks, sensor networks and more. Observational data describing these networks can often times be obtained; unfortunately, this graph data is usually noisy and uncertain. In this research, we proposed a formalism which allows us to capture and reason about the inherent uncertainty and imprecision in an underlying graph. We began by proposing probabilistic soft logic (PSL), a simple, yet powerful, language for describing problems which require probabilistic reasoning about similarity in networked data. We also introduced the notion of visual comparative analysis of PSL models derived using different evidence and assumptions, and illustrate its utility for the analysis of graphs and networks. Dealing with noise and uncertainty in complex domains, and conducting comparative analytics are core capabilities required for the FODAVA mission. Finally, we also integrated our representation, comparative analysis and visualizations methods into an open source toolkit that supports the representation, comparison and visualization of PSL models. In addition to the toolkit, we also worked with researchers in a variety of interdisciplinary domains to validate the utility of our approach.

Broader Impacts
The broader impacts of our work will be to provide foundations for reasoning about similarity and uncertainty in graph and network data together with the visual analytic tools to understand and compare different models and assumptions. Our proposed methods will enable those across these different fields to 1) use a sound, general purpose probabilistic language for describing the entities, relationships and features of their network data; 2) compare multiple models created based on different evidence; 3) better understand the comparisons using new visual paradigms. This new way of describing and comparing uncertainty in data may have a profound impact on the ability of people across disciplines to adequately reason about noise and uncertainty. In addition to making the software developed open-source, the PIs plan to develop tutorials and train students in the use of the tools. Finally, in an effort to make these methods and tools available to a broader scientific community, we plan to provide tutorials and training to observational scientists that have large amounts of uncertain and partial data. Specifically, we will work with a group of biologists at Georgetown University who observe a community of dolphins in Shark Bay, Australia. By doing this, we hope to propel the use of these methods outside of computer science and mathematics.

This work is supported by the National Science Foundation, This is a collaborative research effort bringing together expertise of Lise Getoor, University of Maryland College Park (0937094), Alex Pang, University of California-Santa Cruz (0937073) and Lisa Singh, Georgetown University (0937070). All opinions, findings, conclusions and recommendations in any material resulting from this research are those of the researchers and do not necessarily reflect the views of the National Science Foundation.