The Victorian Cancer Registry had the problem of only having records about cancer patients at the time of their diagnosis and at their death. This leaves a very incomplete record of the progression of the disease, for example there is no systematically collected information about the recurrence of cancer, how quickly it re-occurs for different tumours and the nature of how it spreads when it does re-occur. Whilst initial diagnosis typically comes through pathology reports or surgical intervention there are no such sources for recurrence and its progression. This information is typically contained in radiology imaging reports which the Registry does not collect. Furthermore the task of collecting these reports using the current methods of receiving faxes, letters and email messages from the various services is not practical given today’s stringent budget restrictions and the substantial labour force needed to process some 100,000s of reports. The VCR turned to HLA to formulate a full automatic solution to this data collection problem.
The basic solution was to design a pipeline where the data would flow from the radiologists imaging service to the registry’s data stores without any manual intervention, as shown in the figure below.
The purpose of the pipeline is to automatically complete these tasks:
- Identify cancer reports at the imaging service, typically found in PET, MRI, CT and Bone Scan;
- Transfer using the Internet cancer reports from the imaging service to the Cancer Registry;
- Identify the report purpose and tumour type;
- Annotate the report for semantic, grammatical and structural features;
- Extract measures of disease extent.
- Infer the generalised codes used in medicine for describing disease progression; and,
- Pass this data into the registry data stores.
The pipeline consists of firstly, a document classifier installed at the site of a radiology imaging service that picks up reports as they are completed by a radiologist and detects if they are a cancer or some other non-cancer condition. The cancer report is automatically forwarded to the registry where it enters the next stage of the pipeline and is passed through a document classifier that identifies the purpose of the report such as whether it is for diagnosis, or, treatment monitoring, or, evaluation of recurrence, etc. The next processor is also a document classifier that determines the tumour stream of the report, for example, pancreas, colorectal, lung, etc. Subsequently the report is sent to a semantic tagger that recognises all the content in the report of semantic, grammatical and structural relevance to the registry’s data collection objectives. This pertinent content such as tumour site, size, nodal and metastatic involvement are extracted from the report and converted to the assessment codes used in clinical practice to identify the severity of the disease, the TNM and Staging scores. This content is then lodged into the registry’s data stores.