Deep learning models for text to image retrieval

Researchers of the Networked Multimedia Information System (NeMIS) laboratory of Istituto di Scienza e Tecnologie dell'Informazione "A. Faedo" in Pisa have developed a deep learning-based image search model that works by taking in input a short textual description of the image the user is looking for, e.g.: "a group of skiers in a sunny day".
They implemented the actual search process as a similarity search in a visual feature space, by learning to translate a textual query into a deep neural network-based abstract visual representation, which can be somewhat compared to the "mental picture" one builds when reading a piece of text.
Searching in the visual feature space has the advantage that any evolution of the text-to-abstract image translation model does not require to reprocess the (typically huge) image collection on which the search is performed, a process required by traditional image-to-text models that work by converting images to sets of keywords.

Paper: Picture It In Your Mind: Generating High Level Visual Representations From Textual Descriptions
Fabio Carrara, Andrea Esuli, Tiziano Fagni, Fabrizio Falchi, Alejandro Moreo Fernández
More info: