{"id":47,"date":"2017-06-19T18:49:01","date_gmt":"2017-06-19T21:49:01","guid":{"rendered":"http:\/\/web.inf.ufpr.br\/vri2\/?page_id=47"},"modified":"2017-06-19T18:50:17","modified_gmt":"2017-06-19T21:50:17","slug":"authorship-attribution-database","status":"publish","type":"page","link":"https:\/\/web.inf.ufpr.br\/vri\/databases\/authorship-attribution-database\/","title":{"rendered":"Authorship Attribution Database"},"content":{"rendered":"<div id=\"viewlet-below-content-title\">The Authorship Attribution Database (AAD) contains\u00a0 short articles from 100 different authors whose texts were uniformly distributed over 10 different subjects<\/div>\n<div id=\"content-core\">\n<div id=\"parent-fieldname-text-34fce6e6e0064c0a8ef0914cff3865f0\" class=\"\">\n<div class=\"page\" title=\"Page 3\">\n<div class=\"layoutArea\">\n<div class=\"column\">\n<ul>\n<li>Miscellaneous,<\/li>\n<li>Law,<\/li>\n<li>Economics,<\/li>\n<li>Sports,<\/li>\n<li>Gastronomy,<\/li>\n<li>Literature,<\/li>\n<li>Politics,<\/li>\n<li>Health,<\/li>\n<li>Technology<\/li>\n<li>Tourism.<\/li>\n<\/ul>\n<p>The sources were 15 Brazilian newspapers located all over the country. We have chosen 30 short articles from each author, thus summing up 3000 pieces of documents. The articles usually deal with polemic subjects and express the authors personal opinion. In average, the articles have 600 tokens (words) and 350 Hapax (words occurring once). One aspect worth of remark is that this kind of articles can go through some revision process, which can remove some personal characteristics of the texts. Besides, authorship attribution using short articles poses an extra challenge since the number of features that can be extracted are directly related to the size of the text.<\/p>\n<h2>How to obtain access to the data<\/h2>\n<p>The AAD daatabase may be used for non-commercial research provided you acknowledge the\u00a0source of the image by citing the following paper in publications about your research:<\/p>\n<div class=\"page\" title=\"Page 5\">\n<div class=\"layoutArea\">\n<div class=\"column\">\n<ul>\n<li>P.J. Varela, E. Justino, L.S. Oliveira, Selecting Syntactic Attributes For Authorship Attribution, IEEE International Joint Conference on Neural Networks, 2011, 161\u2013172. (<span class=\"link-external\"><a class=\"external-link\" href=\"http:\/\/www.inf.ufpr.br\/lesoliveira\/download\/ijcnn2011.pdf\">pdf<\/a><\/span>)<\/li>\n<\/ul>\n<\/div>\n<\/div>\n<\/div>\n<p>Click <span class=\"link-external\"><a class=\"external-link\" href=\"http:\/\/www.inf.ufpr.br\/lesoliveira\/download\/AAD100.rar\">here<\/a><\/span> to download the database (6.2 MB)<\/p>\n<h3>Related papers<\/h3>\n<p>Our last results on this database can be found in this reference<\/p>\n<ul>\n<li>Oliveira Jr., W., Oliveira, L. S<b>.<\/b>, Justino, E., Comparing Compression Models for Authorship Attribution, Forensic Science International, 228(1-3):100-104, 2013. <span class=\"link-external\"><a class=\"wiki external\" href=\"http:\/\/www.inf.ufpr.br\/lesoliveira\/download\/FSI2013.pdf\" target=\"_blank\" rel=\"external noopener\">pdf<\/a><\/span>.<\/li>\n<\/ul>\n<hr \/>\n<p><a href=\"https:\/\/creativecommons.org\/licenses\/by\/4.0\/\"><img decoding=\"async\" class=\"size-full wp-image-41 alignleft\" src=\"http:\/\/web.inf.ufpr.br\/vri2\/wp-content\/uploads\/sites\/7\/2017\/06\/Screen-Shot-2017-06-19-at-18.37.55.png\" alt=\"\" width=\"91\" height=\"35\" \/><\/a>This database is licensed under a <span class=\"link-external\"><a href=\"http:\/\/creativecommons.org\/licenses\/by\/4.0\/\" rel=\"license\">Creative Commons Attribution 4.0 International License<\/a><\/span>.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>The Authorship Attribution Database (AAD) contains\u00a0 short articles from 100 different authors whose texts were uniformly distributed over 10 different subjects Miscellaneous, Law, Economics, Sports, Gastronomy, Literature, Politics, Health, Technology Tourism. The sources were 15 Brazilian newspapers located all over <a href=\"https:\/\/web.inf.ufpr.br\/vri\/databases\/authorship-attribution-database\/\" class=\"read-more\">Read More &#8230;<\/a><\/p>\n","protected":false},"author":3,"featured_media":0,"parent":16,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-47","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/web.inf.ufpr.br\/vri\/wp-json\/wp\/v2\/pages\/47","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/web.inf.ufpr.br\/vri\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/web.inf.ufpr.br\/vri\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/web.inf.ufpr.br\/vri\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/web.inf.ufpr.br\/vri\/wp-json\/wp\/v2\/comments?post=47"}],"version-history":[{"count":4,"href":"https:\/\/web.inf.ufpr.br\/vri\/wp-json\/wp\/v2\/pages\/47\/revisions"}],"predecessor-version":[{"id":1886,"href":"https:\/\/web.inf.ufpr.br\/vri\/wp-json\/wp\/v2\/pages\/47\/revisions\/1886"}],"up":[{"embeddable":true,"href":"https:\/\/web.inf.ufpr.br\/vri\/wp-json\/wp\/v2\/pages\/16"}],"wp:attachment":[{"href":"https:\/\/web.inf.ufpr.br\/vri\/wp-json\/wp\/v2\/media?parent=47"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}