Summarizing papers from BioNLP workshop in ACL2020

Anjani Dhrangadhariya
10 min readJul 9, 2020

--

Design by Jingya Chen. Official @Twitter cover picture for ACL2020 virtual conference. All rights reserved with the owners.

Update: While writing this summary about papers, I did not expect, I will be presenting a paper at ACL 2022. Life’s full of unexpected twists, and I’m totally here for it! 🙌📝

Presenting DISTANT-CTO at BioNLP @ACL2022

This week was packed with exploring the virtual ACL2020 papers, tutorials, workshops, and more virtual events like “Birds of a feather” sessions. ACL accepted more than 700 papers this year and included a special section called WiNLP (Widening NLP as I learned) to specially include papers from under-represented communities. THIS CONFERENCE WAS SUCH A RICH EXPERIENCE. I hope more conferences get organized virtually.

In this post, I try my best to briefly summarize the papers for the pre-recorded videos included in the Biomedical Natural Language Processing Workshop at the Association of Computational Linguistics Conference (BioNLP@ACL).

Paper 1: Evaluating the Utility of Model Configurations and Data Augmentation on Clinical Semantic Textual Similarity

Figure 1: Image shows result comparison for the clinical STS task. [All rights reserved with the authors/presenter of the paper.]

Clinical data falls under the umbrella of low-resource data and often leads to models overfitting. To overcome this bottleneck, the author explores model configurations (e.g. different pooling methods and fine-tuning strategies) and data augmentation techniques (segment reordering and back-translation) for the clinical STS task measuring semantic similarity between texts.

Specifically, their BERT-HConv configuration involving two convolutional layers (layer one captures word-level information, layer 2 captures more distant sentence-level information) performs better than BERT-CLS embeddings.

With the back-translation (data augmentation) strategy, the accuracy improved even further showing how increasing data for low-resource tasks might increase domain-specific model performance.

For more details read their paper here.

Paper 2: Interactive Extractive Search over Biomedical Corpora

Figure 2: Image showing search results for a query to SPIKE - A power tool for extractive search. [All rights reserved with the authors/presenter of the paper.]

In this paper presentation, the author demonstrates SPIKE — A power tool for extractive search over CORD19 text corpora.

SPIKE has the capability to search over corpora using boolean, token-level, and sentence-level searches. It does not only search using basic string search functionality but also uses smart linguistic information like lemma and semantic entities for the search. The tool also allows the extraction and download of the searched information. In the video presentation, the authors also demonstrate how to perform statistical operations on the extracted information. I highly recommend trying the tool here.

To know more about the functionality of the tool read the paper here.

Paper 3: A Data-driven Approach to Distantly Supervised Biomedical Relation Extraction

Figure 3: Image showing the novel data encoding scheme proposed by the authors for the biomedical relation extraction task. [All rights reserved with the authors/presenter of the paper.]

In this very exciting research paper, the author explores distant supervision techniques (aligning data from knowledgebases (KB) like UMLS to the unstructured text corpora (PubMed)) for relation extraction (RE).

They use triples <subject, relation, object> from UMLS as labels for a bag of sentences where both object and subject are known, but relation(s) is not known. Since the individual samples consist of multiple sentences, they use multi-instance learning (MIL) approach to reduce noise in relation extraction.

They do this by contributing a novel data encoding scheme with entity markings used to encode the order of entities in a triple from the KB to reduce noise. This method achieves SOTA in the biomedical domain for RE and outperforms previous work that was done by Dai et al.:

To understand their data encoding scheme in detail, read the paper here.

Paper 4: Personalized Early Stage Alzheimer’s Disease Detection: A Case Study of President Reagan’s Speeches

Figure 4: Image showing information about the extracted speech-linguistic features from President Reagans’ speech. [All rights reserved with the authors/presenter of the paper.]

Alzheimer's Disease (AD) is one of the major challenges for countries with older demographics. Traditional AD diagnosis methods are based on biomarker tests that are expensive, complex, and often unavailable.

In this paper, the author demonstrates how linguistic biomarkers from speech could be efficient and non-pervasive alternatives to traditional methods.

She demonstrates this using a dataset of 10-years of speech from President Ronald Reagon. President Reagon started showing early signs of AD in 1994.

From this speech data, they extract 3 input features 1) vocabulary richness measures, 2) POS-tags, and 3) readability measures. To visualize the time-dependent linguistic changes, they plot these visual features using t-sne.

They observed in the presidents’ speech analysis a reduction in usage of unique nouns and more frequency of conversational filler words and non-specific nouns concluding that the president actually should have started showing early signs of AD during 1983–87.

This method is highly personalized because it uses individuals’ speech data from over a period of time + the individualized linguistic features for diagnosis.

This was such a fun paper and presentation. for further details read the paper here.

Paper 5: Entity Enriched Neural Models for Clinical Question Answering

Figure 5: The results showing the comparative performance of implemented representations (c = clinical, M = multi-task learning) used for the clinical question answering task. [All rights reserved with the author/presenter of the paper.]

Generalizing model performance in biomedical QA is more challenging compared to open-domain QA given its low-resource nature, lack of large publicly available annotated datasets, and more complex biomedical jargon.

emrQA is a clinical QA dataset created with the help of physicians using logical forms (LFs). (* A single question is (re)formed into 4–5 different questions using paraphrasing technique, but keeping its core semantic information and its answer intact.)

To generalize the model better on unseen paraphrased questions, the authors explore the multi-task learning (MTL)approach combining a biomedical QA task with an auxiliary task of logical question prediction. In order for the model to capture semantics/meaning of the paraphrased questions, this work makes use of clinicalERNIE embeddings (Using clinical BERT as the multi-head attention model for getting the token representations in ERNIE).

This MTL approach using clinicalERNIE achieves competent results on the biomedical QA task at both the sentence and the paragraph-level overtaking plain clinicalBERT and clinicalERNIE embeddings.

To learn more about the structure of MTL learning read their paper here.

Paper 6: Comparative Analysis of Text Classification Approaches in Electronic Health Records

Figure 6: Flowchart for the exploratory text classification approaches. [All rights reserved with the author/presenter of the paper.]

Classifying electronic medical records (EMR) is a difficult task because they are written or recorded by physicians that use different medical terms and abbreviations which are not standardized.

The authors explore four different classification tasks, two each for MIMIC III (tasks: Status and Temporality)and ShARe/CLEF (tasks: Negation and Uncertainty) datasets. All these tasks are binary classification tasks.

They use a very specific preprocessing step for extracting sub-texts to be used for classification whereby they extract n-number of words from around the disease mention of interest from the EMR. Post this sub-text extraction, different embeddings/representations are extracted from the datasets followed by using binary classifiers to classify them into intended classes. These different embeddings were compared against the BoW (bag-of-words) n-gram representation.

  1. List of embeddings used: GloVe, word2vec, fastText (pre-trained on respective datasets), BERT, and BioBERT.
  2. List of binary classifiers used: SVM, ANN, CNN, RNN, LSTM.

One interesting conclusion from this work was that the embeddings trained on the medical data performed better than open-domain embeddings.

To learn more about how they explore the challenge of bioNLP read their paper here.

Paper 7: Experimental Evaluation and Development of a Silver-Standard for the MIMIC-III Clinical Coding Dataset

Figure 7: The image graph shows how the most frequently assigned codes in MIMIC-III are undercoded. [All rights reserved with the author/presenter of the paper.]

Clinical coding is a very time-intensive and expensive manual process and is often neglected during routine collection in hospitals or clinics (Clinical coding is a process of encoding or standardizing a patient episode into standardized terminology.). Moreover, only the trained experts could assign correct clinical codes to the patient records. However, clinical codes are required for insurance and policymaking.

The authors challenge the validity of assigned codes and propose an open-source, reproducible experimental methodology for assessing the validity MIMIC-III dataset.

In the method, they first extract the diagnosis subsection from the reports using rule-based methods, followed by the extraction of medical named-entities (specifically UMLS) using the medCAT tool. UMLS terms are then mapped to ICD codes and then to the original discharge summary in the MIMIC III dataset. This results in the MIMIC-III dataset annotated using the automatic silver-standard annotation. This also results in MIMIC-III dataset being divided into 3 subsets where —

  1. a code was predicted by medCAT but not assigned to MIMIC-III.
  2. a code was predicted by medCAT and assigned to MIMIC-III.
  3. a code was assigned to MIMIC-III but was not predicted by medCAT.

Through analysis of these datasets, the authors conclude that “the most frequently assigned codes in MIMIC-III are undercoded up to 35%.”

To understand this secondary validation procedure better, read their paper here.

Paper 8: Domain Adaptation and Instance Selection for Disease Syndrome Classification over Veterinary Clinical Notes

Figure 8: Flowchart explaining the instance selection process for the domain adaption. [All rights reserved with the author/presenter of the paper.]

The authors aim to train a domain-adapted classifier to identify the reason why an anti-microbial drug was administered in cats and dogs. For domain adaptation to veterinary sciences, they fine-tune the clinicalBERT model using a large veterinary dataset from called VetCompass Australia resulting in VetBERT.

This model was used to explore the classification task where the training dataset with cefovecin annotations was used to test classification performance on two target annotation datasets 1) cephalexin & 2) clavulanate. VetBERT was compared against the baselines (vanilla LSTMs and BERT-base) and outperformed them for both the target datasets. It was also concluded that the classifier performed worse on both the target datasets (cephalexin and the clavulanate drugs) because they had far fewer annotations compared to cefovecin (label imbalance problem).

To further adapt VetBERTs’ classification strength to sub-domains (cephalexin and clavulanate annotated documents were considered under sub-domain here) without heavy annotation efforts, they use the instance selection approach. For instance selection, they identify samples of source domain (cefovecin annotations) for which they have labeled data, which are the most similar to instances from target domains (cephalexin & clavulanate). The methods for preselection of instances for training does improve results on the target classification tasks.

To read more about their domain-adaptation using instance selection technique, read the paper here.

Paper 9: Sequence-to-Set Semantic Tagging for Biomedical IR using Self-Attention

Figure 9: Image showing the training objective for the sequence-to-set task. [All rights reserved with the author/presenter of the paper.]

In this interesting information retrieval task, the authors explore an unsupervised task of assigning semantic tags to unannotated biomedical documents. They achieve this using the documents’ frequency-based TF-IDF (Term Frequency- Inverse Document Frequency) statistics.

Each document dᵢ is assigned k-top TF-IDF terms extracted from it. This is followed by the identification of “most-related” documents {d’} from elsewhere in their corpus and assigning their top TF-IDF terms to the dᵢ (This leads to expansion of semantic tag of dᵢ).

Then, the authors employ attention-based neural representations for encoding documents to predict their semantic tags using the sigmoid/softmax layer. The training objective is to maximize the probability of the set of k-tags for that particular document.

To know more about how they extend this unsupervised task to semi-supervised and supervised tasks for the query expansion problem, read their paper here.

Paper 10: Noise Pollution in Hospital Readmission Prediction: Classification with RL

Figure 10a: Image showing an example of the kind of noise in the clinical notes. [All rights reserved with the author/presenter of the paper.]
Figure 10b: Image showing the reinforcement learning approach to remove noisy text from long clinical documents. [All rights reserved with the author/presenter of the paper.]

One of the most interesting aspects of this paper was the use of Reinforcement Learning (RL) for document classification.

The authors use very noisy, long length, and unstructured clinical notes from Emory Kidney Transplant Dataset to identify if the patient might require readmission after a kidney transplant operation. These noisy clinical notes were first preprocessed (removal of digits, random punctuation marks, and in general noisy tokens).

For classifying such documents, they address the problem of generating an effective representation and handling noise.

Effective representation: They explore the following document representations 1) classical bag-of-words (BoW), 2) averaging word embeddings, and 3) contextual deep learning encoders like clinicalBERT and biLSTMs. For the neural representation, each clinical note is split into several segments, embeddings extracted and averaged over the segments to get the final representation for the clinical note. Conclusion: 1) BoW outperforms contextual deep learning encoders 2) Contextual encoders overfit.

Handling noise: To alleviate the problem of noise, they explore the classical Reinforcement Learning (RL) path whereby they successively remove about 25% of the noisy segments from the documents using RL objective. Pruning noisy segments is rewarded in this RL task. Conclusion: 1) RL improves document classification further by removing noisy tokens.

To explore how they use reinforcement learning (RL), read their paper here.

From all these papers and talks, I have these conclusions.

  1. The BioNLP task is low-resource. It is very difficult to get large-scale manual annotations from medical professionals for valid reasons (Duh! 🤷 They have a more urgent job of treating patients).
  2. Biomedical documents are noisy and contain medical jargon, non-standardized terms, special medical/biological/healthcare context, and non-standardized abbreviations used by the doctors while taking notes.
  3. It is advisable to use domain adapted or fine-tuned neural models for biomedical tasks. Out-of-the-box (trained for open-domain problems) methods do not do well on them. We need customizable or more specialized models for them.
  4. Neural encoders for biomedical text representation “could” potentially overfit the small medical datasets.

Anyone reading this post, please feel free to write to me in case you find some mistakes. I will really appreciate any constructive criticism. 🙏🏽

Follow me on Twitter or connect on LinkedIn.

👩 #womenintech #brownintech #NLProc

--

--

Anjani Dhrangadhariya
Anjani Dhrangadhariya

Written by Anjani Dhrangadhariya

My love for language processing comes from my ♥ for languages (Hail lexicons!). Working as a *noun* at a *Proper Noun* in the domain of *Fancy Adjective Noun*.

No responses yet