Domain and Task
Related Work
3.1. Text mining and NLP research overview
3.2. Text mining and NLP in industry use
4.6. XML parsing, data joining, and risk indices development
Experiment and Demonstration
Discussion
6.1. The ‘industry’ focus of the project
6.2. Data heterogeneity, multilingual and multi-task nature
Our literature review shows that, despite significant research in the areas of text mining and NLP, there is a strong dominance by supervised methods built on well-curated data that do not transfer well to practical scenarios. This is partially reflected by the number of industrial text mining/NLP studies that incorporated rule-based methods and the use of domain lexicons, except a few areas (e.g., the legal domain) where high quality curated resources are abundant. The majority of industrial studies also look at single and sometimes simplified tasks, but do not report a full process in an end-to-end fashion, particularly with a lack of details on how data heterogeneity and inconsistency is dealt with by their methods. Further, no prior work has focused on the healthcare domain. Our work will address these gaps.
Authors:
(1) Ziqi Zhang*, Information School, the University of Sheffield, Regent Court, Sheffield, UKS1 4DP ([email protected]);
(2) Tomas Jasaitis, Vamstar Ltd., London ([email protected]);
(3) Richard Freeman, Vamstar Ltd., London ([email protected]);
(4) Rowida Alfrjani, Information School, the University of Sheffield, Regent Court, Sheffield, UKS1 4DP ([email protected]);
(5) Adam Funk, Information School, the University of Sheffield, Regent Court, Sheffield, UKS1 4DP ([email protected]).
This paper is