Document Layout Analysis for Newspapers

Status: open
Supervisor: Markus Diem, Florian Kleber, Stefan Fiel

Document layout analysis deals with the layout structure of document images, thus segmenting a page into homogeneous image regions. Within the project READ a framework for layout analysis is currently developed. The layout analysis allows to detect text regions (text lines, text blocks, etc.).

Example newspaper with image and chart regions marked in grey
Example newspaper with image and chart regions marked in grey

The main goal of the master thesis will be to adopt the layout analysis to detect the region of graphs, charts and images mainly in newspapers. The document images are created from pdfs. An example image is shown on this page.

Objectives

A layout analysis methodology will be implemented in this master thesis with special focus on the detection and classification of image and graph regions.

Financing

On success a funding by APA-IT is possible

Requirements