ICDAR2019 – Competition on GBNLA

Historical newspapers have challenging features for layout analysis methods, since noise removal is not always possible, non-standard fonts are common and they usually have a low print quality. Therefore we organize the Competition on German-Brazilian Newspaper Layout Analysis (ICDAR 2019 – GBNLA) using digitized versions of newspapers produced in Brazil in the late 18th early 19th century in German language. The German-Brazilian Newspapers Dataset (GBN v1.0) released for this competition contains 152 grayscale page images with a resolution of 600 dpi. The purpose of this competition is to segment and label “text“, “image“, “graphic” and “separator” regions on each page based only on the grayscale information, i.e., no binarized version of the input is provided.

Competition’s Calendar

Date Description
 February 11, 2019
April 30, 2019
  • Closure of the registration to competition.
May 01, 2019
  • Participants list is announced.
  • Publication of the complete training dataset.
May 15, 2019
  • Submission deadline for the participating methods description and command line executable file.
June 15, 2019
  • Sending the competition results to the participants.
  • Publication of the testing dataset.
  • Submission of the camera-ready report.

How to participate

  1. Register your interest by sending an e-mail to icdar2019-gbnla@inf.ufpr.br with the title “Registration(before April 30, 2019). The e-mail must contain the name of the participant(s), the educational institution or research group they belong and in which of the challenges will participate: challenge 1 only, challenge 2 only or both challenges 1 and 2.
  2. Download the sample training GBN v1.0 Dataset.
  3. Submit (by e-mail) a brief description of the chosen approach in PDF format and an command line executable file of the method. The executable file should receive as input a PNG file (an image of a newspaper page) and return as output a XML PAGE[1] file named with the same image name (before May 15, 2019).

[1] Pletschacher, S., Antonacopoulos, A.: The page (page analysis and ground-truth elements) format framework. In: Pattern Recognition (ICPR), 2010 20th International Conference on. pp. 257-260. IEEE (2010).

Challenges of the Competition

  1. Ability to correctly segment and classify regions from newspapers represented in the training dataset. The testing dataset is composed of 28 pages from following newspapers: “Der Gemeindebote“, “Der Jugendfreund“, “Der Pioner“, “Der Sandwirt“, “Ev.-Luth. Kirchenblatt“, and “Kolonie Zeitung“.
  2.  Ability to generalize, i.e, to correctly segment and classify regions of newspapers that were not present in the traning dataset. The testing dataset for this challenge is composed of 22 pages from following newspapers: “Gemeindeblatt” and “Heimatbote“.

Evaluation

 The performance evaluation of the participating methods will be based only the classication for the pixels of interest inside a region, i.e., the foreground pixels listed in the TXT ground truth file, will be considered. Predictions for pixels that are not listed in the TXT ground truth will be disregarded. It is important to emphasize that, although the coordinates of the pixels of interest are provided for the training samples, they are not provided (no TXT file) for the test samples. Therefore, the Evaluation will consider only the region mappings in the XML PAGE file provided by the competitors. 

The following metrics will be calculated: Precision, Recall, Accuracy, and F-measure. Where the True Positive (TP) correspond to correctly classified pixels. False Positives (FP) refer to incorrectly classified pixels and, False Negatives (FN) corresponds to pixels of interest that are not inside any classified region.

 


Organizers

Alessandra Belézia Araujo

Master’s student in Post-graduate Program in Computer Science of UFPR and IT technician in Sistemas de Bibliotecas da UFPR.

Prof. Dr. Paulo Astor Soethe

Professor at the Polish, German and Classical Literature Department of the UFPR and Coordinator of the dokumente.br Project.

Prof. Dr. Daniel Weingaertner

Professor at the Informatics Department of the UFPR and Coordinator of the dokumente.br Project.