{"id":77,"date":"2017-06-19T19:23:15","date_gmt":"2017-06-19T22:23:15","guid":{"rendered":"http:\/\/web.inf.ufpr.br\/vri2\/?page_id=77"},"modified":"2018-12-19T09:15:13","modified_gmt":"2018-12-19T11:15:13","slug":"touching-digits","status":"publish","type":"page","link":"https:\/\/web.inf.ufpr.br\/vri\/databases\/touching-digits\/","title":{"rendered":"Touching Digits"},"content":{"rendered":"<div id=\"_mcePaste\">This database was first introduced by Oliveira et al in [1] (version 1.0). It was generated based on 2,000 isolated digits extracted from the hsf_0 series of NIST SD19. The main goal of this database was to provide a common catalog for evaluating segmentation algorithms. It is important to mention that the 2,000 images used to create it were correctly recognized by the our classifier (multi-layer perceptron that uses a 132-dimensional feature vector based on concavities and contour information). This issue is relevant for assessing the segmentation, and it will be further discussed in subsequent sections. The algorithm responsible for building the synthetic database is very simple, and is based on two rules:<\/div>\n<div id=\"_mcePaste\"><\/div>\n<div id=\"_mcePaste\">\n<ul>\n<li>It connects only digits produced by one writer. The information about the writer is provided in NIST SD19. Fifty different writers were considered.<\/li>\n<li>The reference axis along which the digits slide is the center line.<\/li>\n<\/ul>\n<\/div>\n<div id=\"_mcePaste\"><\/div>\n<div>The aim of these rules is to avoid unreasonable connections (e.g, very small digits connected to very big ones) and make the synthetic data more real.<\/div>\n<div><\/div>\n<div>According to the literature [2], touching digits can be classified into five different categories, as depicted in the figure below<\/div>\n<div><\/div>\n<div><img fetchpriority=\"high\" decoding=\"async\" class=\"alignnone size-full wp-image-78\" src=\"http:\/\/web.inf.ufpr.br\/vri2\/wp-content\/uploads\/sites\/7\/2017\/06\/Screen-Shot-2017-06-19-at-19.22.00.png\" alt=\"\" width=\"494\" height=\"269\" srcset=\"https:\/\/web.inf.ufpr.br\/vri\/wp-content\/uploads\/sites\/7\/2017\/06\/Screen-Shot-2017-06-19-at-19.22.00.png 494w, https:\/\/web.inf.ufpr.br\/vri\/wp-content\/uploads\/sites\/7\/2017\/06\/Screen-Shot-2017-06-19-at-19.22.00-300x163.png 300w, https:\/\/web.inf.ufpr.br\/vri\/wp-content\/uploads\/sites\/7\/2017\/06\/Screen-Shot-2017-06-19-at-19.22.00-360x196.png 360w\" sizes=\"(max-width: 494px) 100vw, 494px\" \/><\/div>\n<div><\/div>\n<div>This second version (V2.0), used in [3], of the database contains 79,466 samples distributed into the 100 classes of touching pairs, which correspond to the possible combinations of two digits. Some of the classes involving the digit 1 still contain fewer samples than other classes. Owing to the American style of handwriting, the digit 1 is very often with the other digit in the pair. The next table shows the distribution of the database based on the type of connection<\/div>\n<div><\/div>\n<div><img decoding=\"async\" class=\"alignnone size-full wp-image-79\" src=\"http:\/\/web.inf.ufpr.br\/vri2\/wp-content\/uploads\/sites\/7\/2017\/06\/Screen-Shot-2017-06-19-at-19.22.35.png\" alt=\"\" width=\"190\" height=\"104\" \/><\/div>\n<div><\/div>\n<h3>Synthetic Data for the Segmentation-free Approach<\/h3>\n<div>In order to assess a segmentation-free approach based on Deep Convolutional Neural Networks [4], we have created some synthetic data containing touching strings of 2-, 3-, and 4-digits. The strings are built by concatenating isolated digits of NIST SD19 through the algorithm described in [3]. To avoid building a biased dataset, we have used the information about the authors available on the NIST SD19 so that digits from different authors were used exclusively for training, validation, and testing. The table below\u00a0 shows the purpose (training, validation, and testing) and also the amount of data created.<\/div>\n<div><\/div>\n<div><\/div>\n<div>\n<table class=\"plain\">\n<tbody>\n<tr>\n<td>Length\/Classes<\/td>\n<td>Samples<\/td>\n<td>Authors from NIST<\/td>\n<td>Purpose<\/td>\n<\/tr>\n<tr>\n<td>2-digit (100 classes)<\/td>\n<td>161,563<\/p>\n<p>53,907<\/p>\n<p>55,091<\/td>\n<td>1000-1599<\/p>\n<p>1600-1799<\/p>\n<p>1800-1999<\/td>\n<td>Training<\/p>\n<p>Validation<\/p>\n<p>Testing<\/td>\n<\/tr>\n<tr>\n<td>3-digit (1000 classes)<\/td>\n<td>1,448,680<\/p>\n<p>484,346<\/p>\n<p>491,749<\/td>\n<td>1000-1599<\/p>\n<p>1600-1799<\/p>\n<p>1800-1999<\/td>\n<td>Training<\/p>\n<p>Validation<\/p>\n<p>Testing<\/td>\n<\/tr>\n<tr>\n<td>4-digit *<\/td>\n<td>100,000<\/p>\n<p>20,000<\/p>\n<p>20,000<\/td>\n<td>1000-1599<\/p>\n<p>1600-1799<\/p>\n<p>1800-1999<\/td>\n<td>Training<\/p>\n<p>Validation<\/p>\n<p>Testing<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>*The goal of 4-digit strings was to have data to train a classifier to predict the size of the numerical string (1,2,3, or 4 digits)<\/p>\n<\/div>\n<p>The architectures of the Convolutional Neural Network classifiers used in [4] are also available (based on the Caffe Framework)<\/p>\n<ul>\n<li><a href=\"http:\/\/www.inf.ufpr.br\/lesoliveira\/download\/CNNC1.prototxt\">Single Digit<\/a><\/li>\n<li><a href=\"http:\/\/www.inf.ufpr.br\/lesoliveira\/download\/CNNC2.prototxt\">2-digit<\/a><\/li>\n<li><a href=\"http:\/\/www.inf.ufpr.br\/lesoliveira\/download\/CNNC3.prototxt\">3-digit<\/a><\/li>\n<li><a href=\"http:\/\/www.inf.ufpr.br\/lesoliveira\/download\/CNNL.prototxt\">Length Estimation<\/a><\/li>\n<\/ul>\n<div>\n<h2>How to obtain access to the images<\/h2>\n<p>Both Touching Digit (TP) database and\u00a0Synthetic data for the segmentation-free approach\u00a0may be used for non-commercial research provided you acknowledge the source of the image by citing the following papers in publications about your research:<\/p>\n<ul>\n<li>F. C. Ribas, L. S. Oliveira, A. S. Britto Jr, and R. Sabourin.\u00a0<span class=\"link-external\"><a class=\"external-link\" href=\"http:\/\/www.inf.ufpr.br\/lesoliveira\/download\/IJDAR2012.pdf\">Handwritten Digit Segmentation: A Comparative Study<\/a><\/span>, International Journal of Document Analysis and Recognition,\u00a016(2):127-137, 2013.<\/li>\n<li>A. Hochuli, L. S. Oliveira, A. S. Britto Jr, and R. Sabourin. <a href=\"http:\/\/www.inf.ufpr.br\/lesoliveira\/download\/PR2018Hochuli.pdf\">Handwritten Digits Segmentation: Is it still necessary?<\/a> Pattern Recognition,\u00a0 78:1-11, 2018.<\/li>\n<\/ul>\n<p>Click <a href=\"http:\/\/www.inf.ufpr.br\/vri\/databases\/DigitString\/\">here<\/a> to download the datasets<\/p>\n<h2>References<\/h2>\n<p>[1] L. S. Oliveira, A. S. Britto Jr, and R. Sabourin. A synthetic database to assess segmentation algorithms. In 8th International Conference on Document Analysis and Recognition, pages 207\u2013 211, 2005<\/p>\n<p>[2] Y. K. Chen and J. F. Wang. Segmentation of single- or multiple-touching handwritten numeral string using background and foreground analysis. IEEE Trans. on Pattern Analysis and Machine Intelligence, 22(11):1304\u20131317, 2000.<\/p>\n<p>[3] F. C. Ribas, L. S. Oliveira, A. S. Britto Jr, and R. Sabourin.\u00a0Handwritten Digit Segmentation: A Comparative Study, International Journal of Document Analysis and Recognition, 16(2):127-137, 2013 .<\/p>\n<p>[4] A. Hochuli, L. S. Oliveira, A. S. Britto Jr, and R. Sabourin. <a href=\"http:\/\/www.inf.ufpr.br\/lesoliveira\/download\/PR2018Hochuli.pdf\">Handwritten Digits Segmentation: Is it still necessary?<\/a> Pattern Recognition,\u00a0 78:1-11, 2018.<\/p>\n<p>[5]\u00a0A. Hochuli, L. S. Oliveira, A. S. Britto Jr, and R. Sabourin. <a href=\"http:\/\/www.inf.ufpr.br\/lesoliveira\/download\/IJCNN2018Hochuli.pdf\">Segmentation-Free Approaches for Handwritten Numerical String Recognition<\/a>, Int. Joint Conference on Neural Networks (IJCNN), \u00a02018.<\/p>\n<hr \/>\n<p><a href=\"https:\/\/creativecommons.org\/licenses\/by\/4.0\/\"><img decoding=\"async\" class=\"size-full wp-image-41 alignleft\" src=\"http:\/\/web.inf.ufpr.br\/vri2\/wp-content\/uploads\/sites\/7\/2017\/06\/Screen-Shot-2017-06-19-at-18.37.55.png\" alt=\"\" width=\"91\" height=\"35\" \/><\/a>This database is licensed under a <span class=\"link-external\"><a href=\"http:\/\/creativecommons.org\/licenses\/by\/4.0\/\" rel=\"license\">Creative Commons Attribution 4.0 International License<\/a><\/span>.<\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>This database was first introduced by Oliveira et al in [1] (version 1.0). It was generated based on 2,000 isolated digits extracted from the hsf_0 series of NIST SD19. The main goal of this database was to provide a common <a href=\"https:\/\/web.inf.ufpr.br\/vri\/databases\/touching-digits\/\" class=\"read-more\">Read More &#8230;<\/a><\/p>\n","protected":false},"author":3,"featured_media":0,"parent":16,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-77","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/web.inf.ufpr.br\/vri\/wp-json\/wp\/v2\/pages\/77","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/web.inf.ufpr.br\/vri\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/web.inf.ufpr.br\/vri\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/web.inf.ufpr.br\/vri\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/web.inf.ufpr.br\/vri\/wp-json\/wp\/v2\/comments?post=77"}],"version-history":[{"count":12,"href":"https:\/\/web.inf.ufpr.br\/vri\/wp-json\/wp\/v2\/pages\/77\/revisions"}],"predecessor-version":[{"id":1887,"href":"https:\/\/web.inf.ufpr.br\/vri\/wp-json\/wp\/v2\/pages\/77\/revisions\/1887"}],"up":[{"embeddable":true,"href":"https:\/\/web.inf.ufpr.br\/vri\/wp-json\/wp\/v2\/pages\/16"}],"wp:attachment":[{"href":"https:\/\/web.inf.ufpr.br\/vri\/wp-json\/wp\/v2\/media?parent=77"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}