Epilepsy Data Extraction and Annotation (EpiDEA)

From Overdensity
Jump to: navigation, search

Introduction

EpiDEA is a flexible and adaptable system, which allows users to selectively apply its components according to variations in the structure and syntax of discharge summaries.

Overview of the EpiDEA system. (Branch I is used to process unstructured free text and Branch II is used to process the less complex semi-structured sections of the discharge summary reports.)

In the PRISM project, EpiDEA is deployed with two distinct branches (Figure 1), where Branch I is used to process unstructured free text and Branch II is used to process the less complex semi-structured sections of the discharge summary reports.

The EpiDEA system has been implemented using a pipeline-based architectural approach and individual components are invoked sequentially. The final result set from both the branches is used to populate the PRISM patient knowledge base, which is queried using a visual query interface.

System for effective processing of discharge summaries. EpiDEA uses a novel Epilepsy and Seizure Ontology (EpSO), which has been developed based on the International League Against Epilepsy (ILAE) classification system, as the core knowledge resource.

EpiDEA system relies on two resources:

cTAKES natural language processing tool which is an open source NLP system for extracting information from clinical narratives in electronic medical records

The second is a set of patient discharge summaries generated by the UH CMC EMU.

EpiDEA implements specialized functions to address the unique challenges of processing epilepsy and seizure-related clinical free text in discharge summaries.

Methods

EpiDEA is deployed with two distinct branches (Figure), where Branch I is used to process unstructured free text and Branch II is used to process the less complex semi-structured sections of the discharge summary reports. The EpiDEA system has been implemented using a pipeline-based architectural approach and individual components are invoked sequentially. The final result set from both the branches is used to populate the PRISM patient knowledge base, which is queried using a visual query interface.

Overview of the EpiDEA system.png

The Epilepsy and Seizure ontology

EpSO is the core resource of the EpiDEA system and is critical for accurately identifying the most relevant epilepsy and seizure-related entities in the discharge summaries. EpSO is modeled using the description logic-based Ontology Web Language (OWL)

Epso class hierarchies.png

EpiDEA leverages EpSO to support a number of functionalities, including:

Term disambiguation: commonly used synonyms and acronyms of a term are modeled using the OWL annotation properties in EpSO and are used to reconcile variations to the correct ontology class.

Term normalization: Syntactic variations of a term, such as singular/plural and acronyms are normalized using EpSO classes together with customized rules.

Subsumption reasoning: The EpSO class hierarchy allows EpiDEA to correctly classify terms according to their broader semantic type. For example, polyspike and sharp wave EEG signal patterns are types of epileptiform patterns, which are classified as abnormal patterns in EpSO. This generalization-specialization information allows investigators to flexibly use either specific EEG patterns or broader semantic types for cohort identification. This functionality is leveraged by EpiDEA to address variability in usage of epilepsy terminology within and across EMUs in the PRISM project without affecting the quality of results for cohort identification.




[1]