Classifying unstructured electronic consult messages to understand primary care physician specialty information needs


Electronic consultation (eConsult) content reflects important information about referring clinician needs across an organization, but is challenging to extract. The objective of this work was to develop machine learning models for classifying eConsult questions for question type and question content. Another objective of this work was to investigate the ability to solve this task with constrained expert time resources.

Materials and methods:

Our data source is the San Francisco Health Network eConsult system, with over 700 000 deidentified questions from the years 2008-2017, from gastroenterology, urology, and neurology specialties. We develop classifiers based on Bidirectional Encoder Representations from Transformers, experimenting with multitask learning to learn when information can be shared across classifiers. We produce learning curves to understand when we may be able to reduce the amount of human labeling required.


Multitask learning shows benefits only in the neurology-urology pair where they shared substantial similarities in the distribution of question types. Continued pretraining of models in new domains is highly effective. In the neurology-urology pair, near-peak performance is achieved with only 10% of the urology training data given all of the neurology data.


Sharing information across classifier types shows little benefit, whereas sharing classifier components across specialties can help if they are similar in the balance of procedural versus cognitive patient care.


We can accurately classify eConsult content with enough labeled data, but only in special cases do methods for reducing labeling effort apply. Future work should explore new learning paradigms to further reduce labeling effort.


electronic consultations; machine learning; natural language processing; specialty care.

Source link

Back to top button