{"id":864,"date":"2019-05-17T14:49:57","date_gmt":"2019-05-17T17:49:57","guid":{"rendered":"http:\/\/web.inf.ufpr.br\/didonet\/?p=864"},"modified":"2019-05-21T14:55:56","modified_gmt":"2019-05-21T17:55:56","slug":"how-to-normalise-imprecise-temporal-expressions","status":"publish","type":"post","link":"https:\/\/web.inf.ufpr.br\/didonet\/2019\/05\/17\/how-to-normalise-imprecise-temporal-expressions\/","title":{"rendered":"How to normalise imprecise temporal expressions"},"content":{"rendered":"\n<p>The extraction of information from text is very important for language understanding. There are different kinds of expressions that can be extracted; temporal expressions (timexes) are one kind of these expressions. A temporal expression, as the name says, contains some information about time present in a text. Some examples of timexes are: 1 day, 3 years, in 1 week, etc. These expressions have an explicit value associated, which can be used to perform calculations. However, it is possible to find imprecise expressions, such as <strong>a few days, several week, some months, <\/strong>etc., in which a quantified value is not available. These are called <strong>imprecise timexes.<\/strong> The question is how to quantify these expressions.<\/p>\n\n\n\n<p>Our research group (C3SL) has developed a solution to extract and normalise this kind of expressions, which is the result of the <a href=\"https:\/\/acervodigital.ufpr.br\/handle\/1884\/43255\">PhD of Hegler Tissot<\/a> (a collaboration between UFPR\/C3SL and the GATE group at The University of Sheffield) <\/p>\n\n\n\n<p>We have extracted data from several open sources to check the occurrences of precise and imprecise timexes. The imprecise timexes occurrences vary from 7 to 35 percent of the timexes total amount, depending on the kind of text. The ones with higher occurrences were medical texts. The work presented a classification of such expressions: <strong>&nbsp;present reference <\/strong>(e.g., now, recently)<strong>, modified value <\/strong>(e.g., less than a month approximately)<strong>, imprecise value <\/strong>(e.g., some days, several weeks)<strong>, range of values <\/strong>(e.g., every 2 months, between some days)<strong>, partial period <\/strong>(e.g., middle of January)<strong> <\/strong>or<strong> generic expression <\/strong>(e.g., this time, at the same time)<strong>.<\/strong><\/p>\n\n\n\n<p>This work presented a set of normalisation models for each kind of expression; the models are a generalisation of probability distributions in the form of trapezoidal fuzzy membership functions (MSF). The two figures below illustrate the membership functions of imprecise timexes comprising &#8220;few&#8221;, &#8220;some&#8221;, &#8220;many&#8221;, and others, in English and Portuguese.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img fetchpriority=\"high\" decoding=\"async\" width=\"1024\" height=\"509\" src=\"http:\/\/web.inf.ufpr.br\/didonet\/wp-content\/uploads\/sites\/12\/2019\/05\/RESULT-En-IV-1024x509.png\" alt=\"\" class=\"wp-image-865\" srcset=\"https:\/\/web.inf.ufpr.br\/didonet\/wp-content\/uploads\/sites\/12\/2019\/05\/RESULT-En-IV-1024x509.png 1024w, https:\/\/web.inf.ufpr.br\/didonet\/wp-content\/uploads\/sites\/12\/2019\/05\/RESULT-En-IV-300x149.png 300w, https:\/\/web.inf.ufpr.br\/didonet\/wp-content\/uploads\/sites\/12\/2019\/05\/RESULT-En-IV-768x382.png 768w, https:\/\/web.inf.ufpr.br\/didonet\/wp-content\/uploads\/sites\/12\/2019\/05\/RESULT-En-IV.png 1271w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><figcaption>MSFs in English<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" width=\"1024\" height=\"509\" src=\"http:\/\/web.inf.ufpr.br\/didonet\/wp-content\/uploads\/sites\/12\/2019\/05\/RESULT-Pt-IV-1024x509.png\" alt=\"\" class=\"wp-image-866\" srcset=\"https:\/\/web.inf.ufpr.br\/didonet\/wp-content\/uploads\/sites\/12\/2019\/05\/RESULT-Pt-IV-1024x509.png 1024w, https:\/\/web.inf.ufpr.br\/didonet\/wp-content\/uploads\/sites\/12\/2019\/05\/RESULT-Pt-IV-300x149.png 300w, https:\/\/web.inf.ufpr.br\/didonet\/wp-content\/uploads\/sites\/12\/2019\/05\/RESULT-Pt-IV-768x382.png 768w, https:\/\/web.inf.ufpr.br\/didonet\/wp-content\/uploads\/sites\/12\/2019\/05\/RESULT-Pt-IV.png 1271w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><figcaption>MSFs in Portuguese<\/figcaption><\/figure>\n\n\n\n<p>We validated these results using a F1 score, after comparing with a test data set, and also an adapted score which we call F1_3D. <\/p>\n\n\n\n<p>The complete and detailed description of this work have been published this year in the Knowledge And Information Systems journal; it can be <a href=\"https:\/\/link.springer.com\/article\/10.1007\/s10115-019-01338-1\">downloaded here (Open Access)<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The extraction of information from text is very important for language understanding. There are different kinds of expressions that can be extracted; temporal expressions (timexes) are one kind of these expressions. A temporal expression, as the name says, contains some information about time present in a text. Some examples of timexes are: 1 day, 3&hellip;<\/p>\n","protected":false},"author":21,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[5,30,1],"tags":[27,29,28],"class_list":["post-864","post","type-post","status-publish","format-standard","hentry","category-c3sl","category-imprecise-timexes","category-sem-categoria","tag-extraction","tag-f1_3d","tag-timexes"],"_links":{"self":[{"href":"https:\/\/web.inf.ufpr.br\/didonet\/wp-json\/wp\/v2\/posts\/864","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/web.inf.ufpr.br\/didonet\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/web.inf.ufpr.br\/didonet\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/web.inf.ufpr.br\/didonet\/wp-json\/wp\/v2\/users\/21"}],"replies":[{"embeddable":true,"href":"https:\/\/web.inf.ufpr.br\/didonet\/wp-json\/wp\/v2\/comments?post=864"}],"version-history":[{"count":20,"href":"https:\/\/web.inf.ufpr.br\/didonet\/wp-json\/wp\/v2\/posts\/864\/revisions"}],"predecessor-version":[{"id":938,"href":"https:\/\/web.inf.ufpr.br\/didonet\/wp-json\/wp\/v2\/posts\/864\/revisions\/938"}],"wp:attachment":[{"href":"https:\/\/web.inf.ufpr.br\/didonet\/wp-json\/wp\/v2\/media?parent=864"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/web.inf.ufpr.br\/didonet\/wp-json\/wp\/v2\/categories?post=864"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/web.inf.ufpr.br\/didonet\/wp-json\/wp\/v2\/tags?post=864"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}