People often ask us what roles Natural Language Processing (NLP) can play in health settings. It is relatively easy to rattle off half a dozen examples of cases studies. However, these instances don’t fulfil the need of enquirers who are really looking to understand the scope of the technology to weigh it up against their own processing tasks.

This blog sets out a more generic description of Clinical NLP (CNLP) functionalities, so that it is easier to understand how a particular application need might sit in the small pantheon of different CNLP technologies. We complete the blog with our wheel of activities that show a more detailed mesh of use case studies to technology functions.

Please let us know if you think we have omitted a disparate category, as distinct from just case studies. If you wish to drill down deeper into the technology of the topic then this essay on the following link lays out a very clear comparison of Deep Understanding and Deep Learning with their application to a case study of coding pathology reports to ICD-O-3: Deep Understanding – Where does it come from?

To begin our discussion lets define the scope of our topic. Broadly speaking, Clinical Natural Language Processing (CNLP) is the automatic processing of clinical texts to:

  1. Identify documents of interest because of particular content, or
  2. Extract target content for a particular purpose, or
  3. Code documents or specific content to specific classification systems.

CNLP can be performed using deterministic methods such as using templates to search for specific strings, or by non-deterministic methods such as machine learners to identify more complex patterns in language variations created by authors by identifying and resolving where necessary word ambiguity, sentence structure, semantic variations and dependencies, and document structure.

Any CNLP technology must be judged on the accuracy with which it computes the desired objectives specified of the user. The higher the accuracy required the more difficult and time consuming is the task to build the solution.

The major roles that CNLP can play within a health organisation are:

  1. Primary Use: Supporting clinical staff to code their clinical interactions to a specified coding scheme. The technology that supports this exercise must ensure the workload for the user is a minimum otherwise it will not be used and/or the quality of their input will be poor and tend to be unusable. One form of assistance can be checking note writing for completeness.

  2. Primary Use: Extract clinical entities for reuse in clinical care documents, for example, care summaries, decision support systems and care protocol validation. A typical example is to extract core content from pathology reports to present to Multidisciplinary Team Meetings (MDMs, aka Tumor Boards) as part of the care summary report. We call this transfer process data flow.

  3. Incidental Use: Support of early warning or incidental findings for pre-emptive intervention. An example is the CNLP analysis of radiology reports for activity irrelevant to the primary purpose of the imaging assessment, such as vertebrae fractures that are a signal of osteoporosis, and thus a trigger for early care. Another example is to check if patients with a particular morbidity or potential morbidity (e.g. from smoking) are being assessed according to established care protocols.

  4. Secondary Use: Improving epidemiological identification. CNLP can be used for coding records to any number of coding systems depending on the need. While SNOMED CT and ICD-10 are the most high profile, there are many other classification systems used for targeted epidemiology such as ICD-O-3 for cancer surveillance, ICPC for General Practice reporting, MEDRA for vaccine and pharmaceutical adverse events, LOINC for pathology reporting. CNLP can use these systems and its extraction power to infer codes and so provide more detailed analysis to drill down for a more substantial and yield of richer results.

  5. Secondary Use: Case Identification. When a large volume of documents needs to be searched two different types of CNLP technology could be applicable.

    1. Single Purpose CNLP – Separating document types: In large processing problems like cancer registry data collection the registries are faced with the problem of distinguishing hundreds of thousands of cancer pathology reports from non-cancer reports. This can be mitigated by document CNLP machine learners that automatically classify documents.

    2. Universal CNLP – Finding target documents: In situations where a population of many different clinical specialists need to retrieve cases across a variety of document types, such as found in a hospital with many different systems holding documents, then an CNLP search engine that allows for tailored searches is most appropriate.

The technology components that go to make up Clinical Natural Language Processing provide a diverse set of answers to medical text application needs. The diagram below and the accompanying table give a deeper dive into what can be enabled with CNLP.

Service TypeApplicationDescription/Example
Report ClassifierDocument SeparationSeparating documents based on defined classes (e.g. separating cancer from non-cancer reports).
Work StreamingFilter documents and route to correct staff or user-groups for processing (e.g. filter breast cancer reports from other tumour streams).
Case IdentificationIdentifying a report that is needed for a particular task, e.g. identifying a reportable cancer case to send it to the cancer registry.
InferencingCancer StagingConvert tumour descriptions to stage.
Data IntegrationIdentify risks and automatically raise alarms.
Content ExtractionRisk AnalysisAutomate extraction of content required to compute risk analysis profiles for patients.
Convert Unstructured text to Structured dataSupply categorised data for BIG Data analytics, and quality audit databases.
EpidemiologyDeliver and codify data for storage in population-based databases.
Patient Safety NotificationsDaily extraction from pathology, imaging and other lab reports of diagnoses requiring clinical attention.
Concept SearchCohort IdentificationFind records with certain characteristics (e.g. All patients with prostate cancer).
Case StudiesIdentify patients who are current problematic cases so as to make comparisons.
Case ReviewFind specific cases without having to know their demographics (e.g. Clinician retrieving a certain case with a known medical history but no recollection of the patient’s identity).
Content RetrievalAd-hoc general-purpose search for text-based records with given content.
Multi-Disciplinary Team Meetings (MDTM)Search for radiology, pathology, and other text notes in preparation for Oncology Multi-Disciplinary Team Meetings.
Report Completion ValidationMedical Record CodingEnsure that all content that forms a complete record is included and flag missing content (e.g. flagging that “Plan” has not been included in a particular discharge summary).
Report Consistency ValidationBilling Errors & OmissionsAutomatic computation of billing codes lowers errors and increases chargeable items.
Unexpected ResultsEnsure that content is consistent across a report (e.g. alerting a radiology report that has a diagnosis that is unexpected in the context of the requested investigation).
Hot Key Coding & ClassificationMedical Records CodingAdd-on application that automatically suggests clinical codes for any on-screen content (e.g. suggesting codes in a pop up for a Discharge Summary in an EMR system).
GP CodingCode specific types of content (e.g. codifying reason for attendance at a General Practice or Emergency Department).