Paper

Fingerprints for Imposed Layers in Document Images Based on Huffman Code and Logical Layout Analysis


Authors:
Surabhi Narayan; Sahana D Gowda
Abstract
A document is characterized by its layout and component structure. Document layout is due to the placement of the content components and document structure is due to the geometrical shape of the content components. Content components in a filled-in document image consist of general information foreground layer and vital information imposed layer. The foreground layer consists of printed text, logos, tables and lines that are identical for documents of the same class; the imposed layer of the document image consists of handwritten text, signatures and seals imposed on the document image that are unique to every document image. Processing filled-in document images for indexing, considering general information along with vital information is complex with the possibility of generating identical indexes due to large amount of general information suppressing fewer imposed layer vital information. In this paper, a novel technique was proposed to generate a unique code by formulating a logical layout of the imposed layer which was extracted from the filled-in document image using registration. The extracted imposed layer components were represented by centroids based on their spatial occupancy and the imposed layer was hierarchically decomposed into 16 equal quadrants. The Huffman tree generation algorithm was applied based on the number of centroids in a quadrant and with quadrant indices were assimilated to generate a unique code for the logical layout of the document image. In order to verify the applicability of this method, extensive experimentation were conducted on extracted imposed layers from application forms, student records, bank cheques and declaration forms.
Keywords
Imposed Layer; Centroids; Huffman Codes; Quad Decomposition; Logical Layout
StartPage
14
EndPage
26
Doi
Download | Back to Issue| Archive