Historical newspapers have challenging features for layout analysis methods, since noise removal is not always possible, non-standard fonts are common and they usually have a low print quality. Therefore we organize the Competition on German-Brazilian Newspaper Layout Analysis (ICDAR 2019 – GBNLA) using digitized versions of newspapers produced in Brazil in the late 18th early 19th century in German language. The German-Brazilian Newspapers Dataset (GBN v1.0) released for this competition contains 152 grayscale page images with a resolution of 600 dpi. The purpose of this competition is to segment and label “text“, “image“, “graphic” and “separator” regions on each page based only on the grayscale information, i.e., no binarized version of the input is provided.
|February 11, 2019||
|April 30, 2019||
|May 01, 2019||
|May 15, 2019||
|June 15, 2019||
How to participate
- Register your interest by sending an e-mail to email@example.com with the title “Registration” (before April 30, 2019). The e-mail must contain the name of the participant(s), the educational institution or research group they belong and in which of the challenges will participate: challenge 1 only, challenge 2 only or both challenges 1 and 2.
- Download the sample training GBN v1.0 Dataset.
- Submit (by e-mail) a brief description of the chosen approach in PDF format and an command line executable file of the method. The executable file should receive as input a PNG file (an image of a newspaper page) and return as output a XML PAGE file named with the same image name (before May 15, 2019).
 Pletschacher, S., Antonacopoulos, A.: The page (page analysis and ground-truth elements) format framework. In: Pattern Recognition (ICPR), 2010 20th International Conference on. pp. 257-260. IEEE (2010).
Challenges of the Competition
- Ability to correctly segment and classify regions from newspapers represented in the training dataset. The testing dataset is composed of 28 pages from following newspapers: “Der Gemeindebote“, “Der Jugendfreund“, “Der Pioner“, “Der Sandwirt“, “Ev.-Luth. Kirchenblatt“, and “Kolonie Zeitung“.
- Ability to generalize, i.e, to correctly segment and classify regions of newspapers that were not present in the traning dataset. The testing dataset for this challenge is composed of 22 pages from following newspapers: “Gemeindeblatt” and “Heimatbote“.
The performance evaluation of the participating methods will be based only the classication for the pixels of interest inside a region, i.e., the foreground pixels listed in the TXT ground truth file, will be considered. Predictions for pixels that are not listed in the TXT ground truth will be disregarded. It is important to emphasize that, although the coordinates of the pixels of interest are provided for the training samples, they are not provided (no TXT file) for the test samples. Therefore, the Evaluation will consider only the region mappings in the XML PAGE file provided by the competitors.
The following metrics will be calculated: Precision, Recall, Accuracy, and F-measure. Where the True Positive (TP) correspond to correctly classified pixels. False Positives (FP) refer to incorrectly classified pixels and, False Negatives (FN) corresponds to pixels of interest that are not inside any classified region.