Consiglio Nazionale delle Ricerche

Tipo di prodottoArticolo in rivista
TitoloAnalysis and recognition of highly degraded printed characters
Anno di pubblicazione2004
Formato
  • Elettronico
  • Cartaceo
Autore/iTonazzini A., Vezzosi S., Bedini L.
Affiliazioni autoriISTI-CNR
Autori CNR e affiliazioni
  • LUIGI BEDINI
  • ANNA TONAZZINI
Lingua/e
  • inglese
AbstractThis paper proposes an integrated system for the processing and analysis of highly degraded printed documents for the purpose of recognizing text characters. As a case study, ancient printed texts are considered. The system is comprised of various blocks operating sequentially. Starting with a single page of the document, the background noise is reduced by wavelet-based decomposition and filtering, the text lines are detected, extracted, and segmented by a simple and fast adaptive thresholding into blobs corresponding to characters, and the various blobs are analyzed by a feedforward multilayer neural network trained with a back-propagation algorithm. For each character, the probability associated with the recognition is then used as a discriminating parameter that determines the automatic activation of a feedback process, leading the system back to a block for refining segmentation. This block acts only on the small portions of the text where the recognition cannot be relied on and makes use of blind deconvolution and MRF-based segmentation techniques whose high complexity is greatly reduced when applied to a few subimages of small size. The experimental results highlight that the proposed system performs a very precise segmentation of the characters and then a highly effective recognition of even strongly degraded texts.
Lingua abstractinglese
Altro abstract-
Lingua altro abstract-
Pagine da236
Pagine a247
Pagine totali-
RivistaInternational journal on document analysis and recognition (Internet)
Attiva dal 1998
Editore: Springer. - Heidelberg
Paese di pubblicazione: Germania
Lingua: inglese
ISSN: 1433-2825
Titolo chiave: International journal on document analysis and recognition (Internet)
Titolo abbreviato: Int. j. doc. anal. recognit. (Internet)
Titoli alternativi:
  • IJDAR. International journal on document analysis and recognition (Internet) (Internet)
  • Document analysis and recognition (Internet) (Internet)
Numero volume della rivista6
Fascicolo della rivista-
DOI-
Verificato da refereeSì: Internazionale
Stato della pubblicazionePublished version
Indicizzazione (in banche dati controllate)
  • Google Scholar (Codice:http://scholar.google.it/scholar?hl=it&q=Analysis+and+recognition+of+highly+degraded+printed+characters&btnG=&lr=)
  • PUMA (Codice:cnr.isti/2003-A0-24)
Parole chiaveDegraded texts, image restoration, Wavelet denoising, Neural Networks
Link (URL, URI)-
Titolo parallelo-
Licenza-
Scadenza embargo-
Data di accettazione-
Note/Altre informazioni-
Strutture CNR
  • ISTI — Istituto di scienza e tecnologie dell'informazione "Alessandro Faedo"
Moduli/Attività/Sottoprogetti CNR-
Progetti Europei-
Allegati
articolo pubblicato (documento privato )
Tipo documento: application/pdf

Dati storici
I dati storici non sono modificabili, sono stati ereditati da altri sistemi (es. Gestione Istituti, PUMA, ...) e hanno solo valore storico.
Area disciplinareComputer Science & Engineering
Area valutazione CIVRScienze e tecnologie per una società dell'informazione e della comunicazione
RivistaInternational Journal on Document Analysis and Rec
Note 17 november 2003 - Pubblicazione online (A0-24)
Descrizione sintetica del prodottoThis paper proposes an integrated system for the processing and the analysis of highly degraded printed documents, with the aim at recognizing the text characters. As a case study, ancient printed texts are considered. The system is constituted of various blocks operating sequentially. Starting from a single page of the document, the background noise is reduced by wavelet-based decomposition and filtering, the text lines are detected, extracted, and segmented into blobs corresponding to characters, by a simple and fast adaptive thresholding, and the various blobs are analyzed by a feed-forward multilayer neural network, trained with a back-propagation algorithm. For each character, the probability associated to the recognition is then used as a discriminating parameter that determines the automatic activation of a feedback process, leading back the system to a block for refining segmentation. This block acts only on the small portions of the text where the recognition cannot be relied on, and makes use of blind deconvolution and MRF-based segmentation techniques, whose high complexity is greatly reduced when applied to a few sub-images of small size. The experimental results highlight that the proposed system performs a very precise segmentation of the characters and then a highly effective recognition of even strongly degraded texts.