RMIT University
Browse

Off-line handwritten text recognition with cross-outs and corrections using deep learning

Download (17.83 MB)
thesis
posted on 2024-11-25, 19:14 authored by Hiqmat Nisa
The automatic transcription of the handwritten image documents to computer readable text has multiple applications, from automated document processing to indexing and document understanding. With the advancement of deep learning technologies, many state-of-the-art techniques have achieved good performance on organized and clean datasets. However, it is still challenging to solve actual use cases because they can be messy and contain minimal labelled data. In this thesis, we first investigate whether it is possible to train a robust model with only synthetically generated cross-out data. For this purpose, we present a methodology to produce strike-through text from an available dataset. We observe that the model performed well on synthetic cross-out recognition but failed to recognise real challenges associated with manually generated cross-out text. Second, we investigate two approaches for cross-out text annotation: (1) provide no annotation, thereby reducing the annotation burden and (2) mark the cross-out text with a special symbol (we have used the symbol #). We found that it does not make any significant difference in accuracy if we train a model on data having cross-outs with or without annotation. This indicates that a model trained with minimal annotations can perform as well as a model trained with extra annotations for cross-out text. Third, we present a new offline English handwritten dataset with multiple unexplored challenges to the research community. These challenges encompass diverse forms of cross-outs, corrections (including instances where new text is inserted after crossing out existing content and changing one letter into another), background noise and unbalanced text length. Fourth, we present a process for collecting and annotating messy handwritten documents when no ground truth is available. Fifth, we evaluate three different architectures for the messy handwritten dataset, including Convolutional Recurrent Neural Networks (CRNN) with Connectionist Temporal Classification (CTC), Sequence to Sequence (Seq2Seq) with attention mechanism and Transformer. We observe that all three models perform well on handwritten lines that are free from messiness but exhibit poor performance on handwritten lines containing messy elements, establishing that the new dataset provides significant new challenges for the handwritten text recognition field. Finally, considering the difficulties of annotating handwritten documents, we investigate using an existing clean dataset in combination with the messy handwritten dataset. The question is how best to train a robust model when limited messy data are available — transfer learning with messy data or train from scratch with a combination of messy and existing data. We conclude from the experimental work that the best way to obtain a robust model is to initialise with weights from a pre-trained model and use the same number of samples from both datasets during training.

History

Degree Type

Doctorate by Research

Imprint Date

2023-01-01

School name

School of Computing Technologies, RMIT University

Former Identifier

9922270805701341

Open access

  • Yes

Usage metrics

    Theses

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC