NLP Tools (I2B2 Project)

From Overdensity
Jump to: navigation, search

Textual medical records contain need to be extracted and/or indexed in order to be analyzed and interpreted by automated tools. A collection of natural language processing (NLP) tools to extract various types of information from unstructured medical records. Although a number of NLP systems had demonstrated good accuracy in information extraction, they were often domain-, institution- and application-specific.

The generic NLP components, when assembled in pipelines and initialized with custom configuration parameters, become a powerful medical data mining instrument. Medical concepts such as diagnoses, comorbidities, discharge medications, and smoking status can be ascertained. A textual medical record is a rich source of clinical information.

A suite of NLP tools for the I2B2 (Informatics for Integrating Biology and the Bedside, a national center for biomedical computing) project to address a wide range of text processing needs.

A modularized and parameterized approach in the software development and employed syntactic, statistical, template-based methods for different parsing tasks. This approach allows users to tailor the NLP tools to extract and index specific information from different domains and institutions.

Methods

I2B2 developed 11 modules for text report processing

NLP components for medical report processing.jpg

  1. Section Splitter
  2. Section Filter
  3. Text Tokenizer
  4. Part-of-Speech (POS) Tagger
  5. Noun Phrase Finder
  6. UMLS Concept Finder
  7. Negation Finder
  8. Regular Expression-based Concept Finder
  9. Sentence Splitter
  10. N-Gram Tool
  11. Classifier (e.g. Smoking Status Classifier)

Resources

[1]

[2]