Multimedia/Multilingual Access to Cultural Heritage | Consiglio Nazionale delle Ricerche

Multimedia/Multilingual Access to Cultural Heritage

The "Multimedia/Multilingual Information Retrieval" group of the Networked Multimedia Information Systems (NMIS) laboratory of ISTI-CNR, has developed a domain-specific multimedia/multilingual search engine targeted for the Cultural Heritage (CH) sector. The search engine, known as Multi Match, is the result of the activities of a European research project with the same name, coordinated by ISTI-CNR (see http://www.multimatch.org/).
Cultural heritage content is everywhere on the web, in traditional environments such as libraries, museums, galleries and audiovisual archives, but also in popular magazines and newspapers, in multiple languages and multiple media. Unfortunately, it is not always easy to find it. Existing search engines are either general purpose, such as Google or Yahoo, or specialised for a specific collection or archive. Domain-specific Internet searches will thus normally retrieve only a minimum part of the information actually available. Users are not provided with suitable tools to identify, interpret and aggregate information from various sources according to their particular needs, whether educational, touristic, commercial, etc. Furthermore, it is rare that advanced functionality such as multilingual support enabling queries across language boundaries, or content-based multimedia search tools are offered.
The MultiMatch search engine is based on the multimedia content management system MILOS (http://milos.isti.cnr.it/) developed by the NMIS laboratory. MultiMatch represents a significant evolution in focussed search engines permitting the retrieval of complex digital objects obtained as a result of a combination of web-crawled CH information and proprietary owned digital cultural objects (provided by museums, picture archives, digital libraies, etc.). This is achieved through the use of techniques that exploit the metadata and other descriptive elements associated with the digital objects. The content retrieved (made up of textual material in diverse languages, images, video and audio) is analysed via automatic classifiers on the basis of the semantic information that can be derived. The digital objects and associated descriptions are then indexed so that they can be searched by the users.
The CH objects can be searched in the users' preferred language, independently of the language of the target collections. The current version of the system supports access and retrieval for four languages: Dutch, English, Italian and Spanish. It is now being extended to cater also for German and Polish. Other languages can be added. The users can choose to have their queries translated automatically or can access the system in interactive mode in order to select the best term or terms from those proposed by the multilingual translation component, thus overcoming problems of sense disambiguation and often obtaining more accurate results.
Textual searches are based not only on the written documents but also on automatically produced transcriptions of the speech included in the audio and video files. Users can refine their initial searches either through additional textual queries or via searches on other media. For example, starting from a given image all images that are recognized as similar to this image can be retrieved. The system also offers various ways to visualize the results obtained, sub-divided for language and for media type, and ranked according to relevance with respect to the query. Future developments will allow the user to view various types of relations existing between the digital objects retrieved, such as, for example, relationships between the works of two different artists, or the temporal and spatial relationships between different objects.
The MultiMatch architecture is scalable with respect to the number of users that can access the system simultaneously and also for the volume of objects that can be managed. This feature will permit the system to be employed in different contexts, ranging from small localized archives to large dynamic contexts such as collections derived from the huge volumes of CH objects present on the Internet.