People
Publications
Software and Datasets


Supported by
National Science Foundation
Award Number: III1320542
PIs: George Kollios and Evimaria Terzi
Network data appear in diverse applications such as social networks, biological networks, and mobile adhoc networks. In many cases, uncertainty or imprecise information becomes a critical impediment to understanding and effectively utilizing the information contained in such graphs. Uncertainty can be due either to the datacollection process or due to the machinelearning techniques utilized to preprocess the data. Such uncertainty is usually encoded in the edges; e.g., a node in a social network influences another node with some probability. In proteinprotein interaction networks, the edges, representing interactions among proteins, are results of noisy and errorprone measurements.
How can we measure the closeness of two proteins in a proteinprotein interaction network? How can we identify communities of users in an uncertain network? Which nodes are the most important nodes? How can we find the most important edges in an uncertain network and how can we decrease their uncertainty? How do we handle network evolution over time?
Most of the above questions have been extensively studied in the context of deterministic graphs. A straightforward approach to answering these questions on probabilistic graphs is to heuristically cast the probability of every edge into a weight and apply existing methodologies on this weighted graph. This approach is problematic; not only there is no meaningful way to perform such a casting, but also there is no principled way to additionally encode normal weights on the edges. Motivated by the possible world semantics of probabilistic databases and probabilistic graphs, we treat every probabilistic graph G as a generative model for deterministic graphs. Each such deterministic graph is a possible world of G and is associated with a probability to be generated. The flexibility of possibleworld semantics comes with high computational cost; after all, the expectation of any measure is taken over all (exponentially many) possible worlds.
The goal of this project is to adopt the possibleworld semantics to probabilistic graphs and within this framework develop algorithms for wellestablished datamining problems. Motivated by reallife applications, we focus on specific dataanalysis tasks. However, our goal is to make our framework general enough so that it can be used for a wider set of tasks and applications.
Intellectual merit: In the presence of uncertainty, even basic notions, e.g., the shortest path between two nodes in the graph, lose their intuitive interpretation. Existing solutions to graphanalysis tasks prove insufficient to handle uncertain graphs. In this project, we lay the foundations of a principled framework that will facilitate the development of models and algorithms for dataanalysis tasks in uncertain graphs. In order to achieve the above goals we will extend traditional dataanalysis methods and address several challenging computational and scalability problems.
Broader impact: We expect our research to have impact on several application domains including online social networks, mobile adhoc networks and biological networks. For example, it can help internetbased and socialmedia related companies to analyze their data and improve their targeting advertisement policies and practices. This will create opportunities for making these companies viable and help the economy of internetbased business. In medicine, biology and biochemistry, networks play a very important role and many of these networks can be better modeled as uncertain networks. Our project can help analyze these networks and lead to new biological insights and development of new drugs and treatments. We plan to disseminate our results as follows: (1) We will develop publicly available prototypes. (2) We will include the results of our work in our classes lectures. We will also engage in our research graduate and undergraduate students, including women and minorities. (3) We will communicate our results to scientists of computer science and other fields and our industry collaborators through publications and demos.
Key Words: uncertain graphs; biological networks; social networks; uncertainty; data mining
News:
Selected Recent Papers!
 Haohan Zhu, Xianrui Meng, and George Kollios.
NED: An InterGraph Node Metric Based On Edit Distance. arXiv 2016.
[PDF]
 Giovanni Comarela, Mark Crovella, and Evimaria Terzi.
Detecting UnusuallyRouted ASes: Methods and Applications ACM IMC 2016.
 Xianrui Meng, Seny Kamara, Kobbi Nissim, and George Kollios.
GRECS: Graph Encryption for Approximate Shortest Distance Queries. ACM CCS 2015.
[PDF]
 Natali Ruchansky, Mark Crovella, and Evimaria Terzi.
Matrix Completion with Queries. ACM SIGKDD 2015.
[PDF]
 Charalampos Mavroforakis, Richard GarciaLebron, Ioannis Koutis, and Evimaria Terzi.
Spanning Edge Centrality: Largescale Computation and Applications. WWW 2015.
[PDF]
Code Release!
A Python Library for efficient computation of spanning edge centralities:
[here]
Any opinions, findings, and conclusions or recommendations expressed
here are those of the author(s) and do not necessarily reflect the
views of the National Science Foundation.
