EventEpi-A natural language processing framework for event-based surveillance.
Cast your vote
You can rate an item by clicking the amount of stars they wish to award to this item.
When enough users have cast their vote on this item, the average rating will also be shown.
Your vote was cast
Thank you for your feedback
Thank you for your feedback
MetadataShow full item record
AbstractAccording to the World Health Organization (WHO), around 60% of all outbreaks are detected using informal sources. In many public health institutes, including the WHO and the Robert Koch Institute (RKI), dedicated groups of public health agents sift through numerous articles and newsletters to detect relevant events. This media screening is one important part of event-based surveillance (EBS). Reading the articles, discussing their relevance, and putting key information into a database is a time-consuming process. To support EBS, but also to gain insights into what makes an article and the event it describes relevant, we developed a natural language processing framework for automated information extraction and relevance scoring. First, we scraped relevant sources for EBS as done at the RKI (WHO Disease Outbreak News and ProMED) and automatically extracted the articles' key data: disease, country, date, and confirmed-case count. For this, we performed named entity recognition in two steps: EpiTator, an open-source epidemiological annotation tool, suggested many different possibilities for each. We extracted the key country and disease using a heuristic with good results. We trained a naive Bayes classifier to find the key date and confirmed-case count, using the RKI's EBS database as labels which performed modestly. Then, for relevance scoring, we defined two classes to which any article might belong: The article is relevant if it is in the EBS database and irrelevant otherwise. We compared the performance of different classifiers, using bag-of-words, document and word embeddings. The best classifier, a logistic regression, achieved a sensitivity of 0.82 and an index balanced accuracy of 0.61. Finally, we integrated these functionalities into a web application called EventEpi where relevant sources are automatically analyzed and put into a database. The user can also provide any URL or text, that will be analyzed in the same way and added to the database. Each of these steps could be improved, in particular with larger labeled datasets and fine-tuning of the learning algorithms. The overall framework, however, works already well and can be used in production, promising improvements in EBS. The source code and data are publicly available under open licenses.
Citation. PLoS Comput Biol. 2020 Nov 20;16(11):e1008277. doi: 10.1371/journal.pcbi.1008277.
AffiliationHZI,Helmholtz-Zentrum für Infektionsforschung GmbH, Inhoffenstr. 7,38124 Braunschweig, Germany.
JournalPLoS computational biology
The following license files are associated with this item:
- Creative Commons
- Digital surveillance in Latin American diseases outbreaks: information extraction from a novel Spanish corpus.
- Authors: Dellanzo A, Cotik V, Lozano Barriga DY, Mollapaza Apaza JJ, Palomino D, Schiaffino F, Yanque Aliaga A, Ochoa-Luna J
- Issue date: 2022 Dec 23
- Global Variations in Event-Based Surveillance for Disease Outbreak Detection: Time Series Analysis.
- Authors: Ganser I, Thiébaut R, Buckeridge DL
- Issue date: 2022 Oct 31
- A novel framework for biomedical entity sense induction.
- Authors: Lossio-Ventura JA, Bian J, Jonquet C, Roche M, Teisseire M
- Issue date: 2018 Aug
- A methodology to enhance spatial understanding of disease outbreak events reported in news articles.
- Authors: Chanlekha H, Collier N
- Issue date: 2010 Apr
- A comparison of word embeddings for the biomedical natural language processing.
- Authors: Wang Y, Liu S, Afzal N, Rastegar-Mojarad M, Wang L, Shen F, Kingsbury P, Liu H
- Issue date: 2018 Nov