Guest Talk: Shlomo Geva

Lecture
Datum: 
16. Dezember 2016
Beginn: 
10:00
Raum: 
2.015

Content Based Image Retrieval, and Scalable Clustering

The talk will be presenting some recent work from the Data Science
Department at the Queensland University of Technology. It will cover two
topis – Content Based Image Retrieval, and Scalable Clustering.

CONTENT BASED IMAGE RETRIEVAL

Content-based image retrieval (CBIR) has attracted much attention due to
the exponential growth of digital image collections that have become
available in recent years. Relevance feedback (RF) in the context of
search engines is a query expansion technique, which is based on
relevance judgments about the top results that are initially returned
for a given query. RF can be obtained directly from end users, inferred
indirectly from user interactions with a result list, or even assumed
(aka pseudo relevance feedback). RF information is used to generate a
new query, aiming to re-focus the query towards more relevant results.

This paper presents a methodology for use of signature based image
retrieval with a user in the loop to improve retrieval performance. We
show how to effectively use explicit RF with signature based image
retrieval to improve retrieval quality and efficiency. Unlike text
search where queries can be modified by users, there is no convenient or
effective way to reformulate an image query. The feedback approach
provides a mechanism for end users to refine their image queries.
Empirical evaluations based on standard benchmarks demonstrate the
effectiveness of the proposed approach in improving the performance of
CBIR in terms of recall, precision, speed and scalability.

CLUSTERING

Clustering is a technique that can help make large datasets more
manageable by grouping together similar objects. However, most
clustering approaches are too computationally expensive for datasets
that are very large or very complex. Here, we present the Parallel
K-Tree, a hierarchical, multi-node and multi-core approach to clustering
extremely large data sets.  We show how the K-Tree is more efficient
than traditional and parallelized approaches. Finally, we discuss how we
applied the K-Tree using three commodity desktop servers to 22 years of
Landsat 5 satellite data. The dataset consists of  eight terabytes,
contains over 540 billion 6-D image pixels and was clustered into 8
billion clusters. This presents a two orders of magnitude size increase
over any reported alternative approach.