OR08 Publications

Content-based image retrieval integrated into Fedora

Burgi, P. Y. and Monbaron, P. (2008) Content-based image retrieval integrated into Fedora. In: Third International Conference on Open Repositories 2008, 1-4 April 2008, Southampton, United Kingdom.

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
[img] MS PowerPoint (Presentation)
PDF (Presentation) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader


We designed a content-based image retrieval system built on the MPEG-7 standard and the Fedora architecture. Initially, the bank of images with their associated metadata was handled through ad hoc application software developed at the University of Geneva. The idea was thus to migrate the whole bank of images onto the Fedora system while affecting the least possible user interfaces. To perform such a migration, XSL transformations were applied on metadata to generate MPEG-7 and FOXML files. Beside textual metadata, the MPEG-7 document also contains content descriptors that are extracted using image processing methods. In our case, we applied the Caliph&Emir software, which allows the extraction of 4 descriptors (3 related to colours and one to edge information) out of the 15 defined in the MPEG-7 standard. The resulting XML files were then fed into Fedora, using Generic Search (gSearch) to index metadata into a Lucene based index, the textual part being indexed and the image descriptors just being stored. Indeed, those descriptors are not suited to Lucene indexing, but we found the storing of this piece of information by this search engine to be convenient for performing the image matching step. Besides the MPEG-7 and DC datastreams, the ingest process yielded both low- and high-resolution image datastreams, in addition to an image thumb datastream. The second part of the project was about building a user interface for image retrieval. Search process is typically initiated with users performing textual searches on metadata (based on gSearch), which eventually result into lists of images matching textual data. Then, users have the possibility to search for additional images based this time on content similarity. Content similarity involves distance measurements performed on the MPEG-7 descriptors represented as data vectors built from the textual data stored in Lucene. The outcome of this matching process is a set of images URLs, which are included in the result web page so that the user web agent can eventually retrieve them from Fedora. The whole retrieval system has been tested over a corpus of about 900 images representing various scenes and people from the campus. Given that the image corpus is quite heterogeneous, our preliminary results are rather satisfactory in that related visual scenes are similarly ranked.

Item Type:Conference or Workshop Item (Paper)
Creators:Pierre-Yves Burgi, Patrick Monbaron
Subjects:User Groups > Fedora User Group > Semantic Technologies
ID Code:113
Deposited By: Leslie Carr
Deposited On:09 Apr 2008 07:15
Last Modified:26 Oct 2011 16:08


Repository Staff Only: item control page

JISC/CNI Meeting 'Transforming the User Experience' July 10-11 2008

Microsoft eScience Workshop at Indiana University, December 7-9, 2008, Accelerating Time to Scientific Discovery

Open Repositories 2009 Atlanta, Georgia. May 18-21 2009