篇名 | Enhanced Constrained Run-Length Algorithm for Complex Layout Document Processing |
---|---|
卷期 | 4:3 |
作者 | Hung-Ming Sun |
頁次 | 297-309 |
關鍵字 | constrained run-length algorithm 、 page segmentation 、 document processing 、 Scopus |
出刊日期 | 200612 |
The Constrained Run-Length Algorithm (CRLA) is a well-known technique for page segmentation. The algorithm is very efficient for partitioning documents with Manhattan layouts but not suited to deal with complex layout pages, e.g. irregular graphics embedded in a text paragraph. Its main drawback is to use only local information during the smearing stage, which may lead to erroneous linkage of text and graphics. This paper presents a solution to this problem by adding global information into the process of the CRLA. This enhanced CRLA can be applied to non-Manhattan page layout successfully. It can also extract text surrounded by a box. Both cases cannot be processed by the original CRLA.