Full Program »
Named Entity Recognition For Content Published In Government Gazettes Based On A Weak Supervision Approach
The paper proposes using a Weak Supervision approach as an initial process to create a decision-making system that requires a Machine Learning step for the task of Named Entity Extraction. Weak Supervision methods can be used when large amounts of labeled data are unavailable, labeled training data is scarce, expensive to obtain, or impractical to annotate manually. We describe developing and evaluating a corpus of text from the Brazilian government's published gazettes, explicitly focusing on public bidding and contract excerpts. The Government Gazette is a form of horizontal accountability which increases the transparency of the public acts done by the government. Reading all the publications to extract and annotate the necessary information requires an inconceivable individual effort to be done daily. Named Entity Recognition (NER) and Weak Supervised approach can reduce the manual annotation required to train NER and classification models. Our proposed work is intended to assist in developing a decision-making system that requires a particular set of entities from the Brazilian government's published gazettes, focusing on public bidding and contract excerpts. In some cases, the direct application of weak supervision to the corpus yielded satisfactory results compared to models trained by costly hand-annotated data.