III-COR-Small: Collaborative Research: Subsequence Matching for Content-based Access in Very Large Multimedia Databases




Software and Datasets

Supported by National Science Foundation

Award Number: IIS-0812309

Award Number: IIS-0812601

PIs: George Kollios (BU), Vassilis Athitsos (UTA), Gautam Das (UTA)

This is a collaborative project with the VLM group at University of Texas at Arlington lead by Prof. Vassilis Athitsos. This project investigates methods for efficient subsequence matching in large time-series databases using the popular Dynamic Time Warping (DTW) distance measure. Embeddings are being designed that partially convert the subsequence matching problem into the much more manageable problem of similarity search in a vector space. This conversion allows leveraging the full arsenal of vector indexing and metric indexing methods for speeding up subsequence matching. The proposed methods will be applicable in a wide variety of time series domains, including, e.g., stock market modeling, seismic activity analysis, and sensor-based health monitoring. To showcase the commercial, social, and educational impact of the research, the project will produce three demonstration systems: a query-by-humming system, a handwritten document search-by-keyword system, and a sign spotting system. The results of the research are being integrated into these systems to achieve efficient retrieval in the presence of large amounts of data. The creation and dissemination of large, real-world datasets for these three systems will be an additional contribution of the project.


  • We released a new demo software for the Query by Humming system that we developed!! Hum-a-Song.
  • A new paper appeared in PVLDB 2012 on subsequence matching: H. Zhu, G. Kollios, and V. Athitsos. A Generic Framework for Efficient and Effective Subsequence Retrieval. In 38th VLDB (PVLDB), Istanbul, Turkey, August, 2012. pdf
  • A new paper accpeted in IEEE TKDE on Clustering Probabilistic Graphs.
  • A new version of the code for DBH has been released that is highly optimized and allows for custom distance functions. More details here.
Any opinions, findings, and conclusions or recommendations expressed here are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.