Possible Topics for Projects

Spatial and Temporal Database:

Implement and Compare efficient indexing methods for temporal databases. In this project you have to implement a number of different indexing methods for temporal database and provide and experimental evaluation. It is possible to consider both single node or cluster based solution on top of Hadoop or SPARK.

Compare methods for efficient computation of shortest distance queries on road networks. Here you have to implement methods that index large road networks for answering shortest distance queries and compare different methods or consider how the distance can change over time (if distance represents the time to go from a start point to an end point.)

Efficient sub-sequence matching in time series using a cluster. In this project you will study how to use Hadoop or SPARK for efficient sub-sequence time series matching in database with very large amount of time series data.

Probabilistic Graph Databases:

Implement and Compare node distances in probabilistic graph databases. In this project, you have to study some recent proposed distance functions on probabilistic graph and then implement them and experimentally compare them on a number of real datasets.

Implement efficient Indexing techniques for answering k-nearest neighbors in probabilistic graphs. In this project, you will develop indexing techniques for answering k-nearest neighbors in probabilistic graphs. The index can be centralized or distributed.

Compare graph management systems for various queries. In this project you will use existing graph management systems and implement efficiewnt algorithms on top of them to query and analyze large datasets.

Evaluate temporal graph database systems. In this project you will implement and evaluate temporal queries on evolving graphs.

Probabilistic Graph Summarization. In this project you will consider the applicability of some graph summarization techniques for probabilistic graph settings.

Database Security and Privacy:

Implementation and Comparison of Order Preserving Encryption Scheme for Large Databases. The main idea of this project is to study two recent OPE schemes and provide an efficient implementation for them. Then, using databases of different size, test the performance of different approaches.

Implementation and comparison of Searchable Encryption Schemes for Information Retrieval. Here we assume that documents and terms inside documents have been encrypted and the question is how to build an efficient indexing scheme that will allow to search over the encrypted terms to find the encrypted documents relevant to a query.

Pot-purri:

BoostMap for Biological Data. BoostMap is a recently proposed technique for indexing expensive and non-metric distance functions. In this project we would like to apply BoostMap to gene sequences or proteins for efficient retrieval and/or classification.

Entity Resolution techniques. Entity Resolution (or De-duplication) is a process that tries to find records in a database that correspond to the same person or entity. For example, multiple health records may actually refer to the same patient, or multiple customer records may refer to the same customer but because of errors during data insertion or data integration, multiple records are created. The goal of this project is to evaluate different methods to compute the similarity between different records without the need to compute all the pairwise distances. Another project can be to evaluate different similarity functions and methods to integrate them into a single value. Finally, another project can be to evaluate and compare existing systems for entity resolution and data cleaning.