@prefix prodottidellaricerca: . @prefix istituto: . @prefix prodotto: . istituto:CDS044 prodottidellaricerca:prodotto prodotto:ID83653 . @prefix pubblicazioni: . @prefix unitaDiPersonaleInterno: . unitaDiPersonaleInterno:MATRICOLA1625 pubblicazioni:autoreCNRDi prodotto:ID83653 . @prefix unitaDiPersonaleEsterno: . unitaDiPersonaleEsterno:ID833 pubblicazioni:autoreCNRDi prodotto:ID83653 . @prefix modulo: . modulo:ID2100 prodottidellaricerca:prodotto prodotto:ID83653 . @prefix rdf: . prodotto:ID83653 rdf:type prodotto:TIPO1301 . @prefix retescientifica: . prodotto:ID83653 rdf:type retescientifica:ProdottoDellaRicerca . @prefix rdfs: . prodotto:ID83653 rdfs:label "Dynamic User-defined Similarity Searching in Semi-structured Text Retrieval (Contributo in atti di convegno)"@en . @prefix xsd: . prodotto:ID83653 pubblicazioni:anno "2008-01-01T00:00:00+01:00"^^xsd:gYear ; pubblicazioni:doi "10.4108/ICST.INFOSCALE2008.3488"^^xsd:string . @prefix skos: . prodotto:ID83653 skos:altLabel "
[1] Geraci F., [1] Pellegrini M. (2008)
Dynamic User-defined Similarity Searching in Semi-structured Text Retrieval
in The Third International ICST Conference on Scalable Information Systems (Infoscale 2008), Vico Equense, Napoli
"^^rdf:HTML ; pubblicazioni:autori "[1] Geraci F., [1] Pellegrini M."^^xsd:string ; pubblicazioni:paginaInizio "10"^^xsd:string ; pubblicazioni:paginaFine "29"^^xsd:string ; pubblicazioni:pagineTotali "20"^^xsd:string ; pubblicazioni:descrizioneSinteticaDelProdotto "Modern text retrieval systems often provide a similarity search utility, that allows the user to find efficiently a fixed number h of documents in the data set that are the most similar to a given query (here a query is either a simple sequence of keywords or a full document). We consider the case of a textual database made of semi-structured documents. For example, in a corpus of bibliographic records any record may be structured into three fields: title, authors and abstract, where each field is an unstructured free text. Each field, in turns, may be modelled with a specific vector space. The problem is more complex when we also allow users to associate at query time to each vector space a weight influencing its contribution to the overall dynamic aggregated and weighted similarity. We investigate the use of metric k-center clustering to prune the search space at query time. The embedding of the weights in the data structure is investigated with the purpose of allowing users query customization without any data replication. The validity of our approach is demonstrated experimentally by showing significant quality/time performance improvements over two state of the art methods. We also speed up the pre-processing time by a factor at least thirty with respect to a method based on k-means clustering."^^xsd:string ; pubblicazioni:affiliazioni "[1] CNR-IIT, Pisa, Italy"^^xsd:string ; pubblicazioni:titolo "Dynamic User-defined Similarity Searching in Semi-structured Text Retrieval"^^xsd:string ; prodottidellaricerca:abstract "Modern text retrieval systems often provide a similarity search utility, that allows the user to find efficiently a fixed number h of documents in the data set that are the most similar to a given query (here a query is either a simple sequence of keywords or a full document). We consider the case of a textual database made of semi-structured documents. For example, in a corpus of bibliographic records any record may be structured into three fields: title, authors and abstract, where each field is an unstructured free text. Each field, in turns, may be modelled with a specific vector space. The problem is more complex when we also allow users to associate at query time to each vector space a weight influencing its contribution to the overall dynamic aggregated and weighted similarity. We investigate the use of metric k-center clustering to prune the search space at query time. The embedding of the weights in the data structure is investigated with the purpose of allowing users query customization without any data replication. The validity of our approach is demonstrated experimentally by showing significant quality/time performance improvements over two state of the art methods. We also speed up the pre-processing time by a factor at least thirty with respect to a method based on k-means clustering."@en ; prodottidellaricerca:prodottoDi istituto:CDS044 , modulo:ID2100 ; pubblicazioni:autoreCNR unitaDiPersonaleEsterno:ID833 , unitaDiPersonaleInterno:MATRICOLA1625 . @prefix parolechiave: . prodotto:ID83653 parolechiave:insiemeDiParoleChiave . parolechiave:insiemeDiParoleChiaveDi prodotto:ID83653 .